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Area  and  delay  estimates  facilitate  effective  decision-making  ability  in  high  level 
synthesis.  Current  estimation  techniques  focus  on  modeling  the  layout  result  and 
fail  to  deliver  timely  or  accurate  estimates.  This  thesis  presents  a  novel  approach 
to  deriving  these  area  and  delay  estimates  by  modeling  the  actions  and  activities  of 
the  layout  tool,  rather  than  the  layout  result.  This  approach  uses  machine  learning 
techniques  to  analyze  the  input-to-output  relationships  that  result  from  applying  the 
target  layout  tool  to  an  input  design  description  and  producing  a  layout  as  an  output. 
This  thesis  describes  a  solution  architecture  using  these  machine  learning  techniques 
that  captures  the  relationships  between  general  design  features  (e.g.,  topology,  con¬ 
nectivity,  common  input,  and  common  output)  and  layout  concepts  (e.g.,  relative 


placement).  This  solution  architecture  has  the  following  characteristics.  First,  a  set 
of  several  training  designs  captures  the  general  design  features.  The  target  layout 
tool  is  run  on  the  training  designs  and  produces  a  set  of  actual  layouts.  The  general 
design  features  and  relative  placement  concepts  from  the  actual  layouts  makes  up  a 
training  set.  The  formulation  of  this  training  set  is  important  to  adequately  describe 
the  set  of  general  design  features  and  the  associated  layout  concepts.  Second,  a  ma¬ 
chine  learning  system  analyzes  the  training  set  looking  for  relationships  between  the 
design  features  and  layout  concepts.  This  analysis  produces  a  model  of  the  operation 
of  the  layout  tool.  This  model  is  in  the  form  of  a  set  of  production  rules.  Third, 
this  model  is  applied  to  real  designs  to  formulate  area  and  delay  estimates.  This  ap¬ 
proach  is  found  to  produce  accurate  area  and  delay  estimates  very  quickly,  even  for 
designs  with  several  thousand  gates.  Experimentation  illustrates  the  identification  of 
the  general  design  features,  the  relative  placement  concepts,  the  formulation  of  the 
training  set,  the  derivation  of  the  tool  model,  and  the  application  of  this  model  to 
real  world  designs.  In  addition,  two  different  layout  tools  were  modeled  to  show  the 
generality  of  this  approach. 
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1.0  INTRODUCTION 


All  design  automation  systems,  independent  of  the  domain,  must  produce  a  phys¬ 
ical  artifact  with  real  physical  characteristics.  The  earlier  knowledge,  or  estimates, 
of  the  final  physical  characteristics  are  known  and  available  for  use  in  the  design 
automation  process,  the  more  optimal  the  design  process,  as  well  as  the  design  itself. 
Unfortunately,  current  VLSI  CAD  design  automation  systems  fail  to  incorporate  ac¬ 
curate  estimates  early  on  in  the  design  process,  namely,  in  high-level  synthesis.  The 
lack  of  accurate  estimates  at  this  level  cripples  the  ability  of  the  high-level  synthesis 
system  to  produce  optimal  designs  in  a  timely  manner. 

McFarland,  Parker,  and  Camposano  define  high-level  synthesis  as  taking  a 

description  of  a  design’s  behavior  along  with  cost  and  performance  constraints  and 
producing  a  structure  that  implements  the  behavior  while  satisfying  the  constraints. 
In  order  to  find  an  acceptable  structure,  high-level  synthesis  systems  give  developers 
the  ability  to  probe  the  design  space  of  possible  designs  and  to  evaluate  these  designs. 
These  evaluations  are  necessary  to  determine  which  designs  are  acceptable  in  terms 
of  satisfying  cost  and  performance  constraints.  (Cost  and  performance  constraints 
set  the  upper  and/or  lower  bounds  on  physical  characteristics  such  as  total  chip  area 
and  critical  path  delay.)  In  order  to  determine  the  acceptability  of  a  design,  the 
high-level  synthesis  system  must  have  the  capability  to  measure  each  design’s  unique 
physical  characteristics. 

The  most  accurate  measurements  of  a  design’s  physical  characteristics  come  from 
fully  functional  and  fabricated  chips.  However,  the  layout  and  fabrication  process  is 
very  time-consuming.  In  addition,  any  fabricated  chips  that  do  not  meet  the  physical 
constraints  set  by  the  designer  become  costly,  wasted  material.  To  avoid  wasting  time 
and  materials,  estimates  of  the  design’s  physical  characteristics  are  used  in  place  of 
the  actual  values.  Only  those  designs  whose  estimates  satisfy  the  physical  constraints 
are  fabricated  and  fully  measured  for  compliance. 


*  Parenthetical  references  placed  superior  to  the  line  of  text  refer  to  the  bibliography. 
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Although  it  is  possible  to  derive  these  estimates  at  different  levels  of  the  synthesis 
process,  these  estimates  become  more  accurate  as  the  design  gets  closer  to  being 
fabricated.  The  most  accurate  estimates  for  a  target  design  are  based  on  the  layout 
mask  that  chip  manufacturers  use  to  actually  build  the  chips.  However,  producing 
this  layout  mask  is  very  time-consuming  so  it  is  not  very  effective  to  produce  a  layout 
for  each  design  in  the  solution  space.  On  the  other  hand,  high-level  synthesis  systems 
that  derive  estimates  at  levels  higher  than  the  layout  level  to  save  time  compromise 
their  effectiveness  because  the  estimates  are  not  very  accurate.  Therefore,  high-level 
synthesis  systems  can  be  ineffective  in  one  of  two  extremes:  they  either  use  detailed 
information  from  actual  layouts  (time-consuming)  or  they  use  high-level  estimates 
(inaccurate)  to  guide  the  synthesis  process.  These  extremes  define  the  end  points  of 
the  estimator  spectrum.  Current  high-level  synthesis  systems  fall  somewhere  between 
these  two  extremes. 

For  example,  the  Fasolt  system  automatically  optimizes  the  scheduling 

and  bus  topology  of  a  register  transfer  level  design.  Fasolt  uses  information  taken 
from  a  layout  model  to  perform  this  optimization.  However,  Fasolt  uses  a  layout 
estimator  and  not  an  actual  layout  tool  to  produce  the  layout  model.  Knapp  was 
more  interested  in  quicker  turn-around  times  for  layout  information  than  in  high- 
quality  layouts  using  slow  layout  tools.  Fasolt’s  use  of  a  layout  estimator  places 
this  system  near  the  extreme  of  producing  actual  layouts.  (This  and  the  other  two 
systems  addressed  in  this  introduction  are  described  in  more  detail  in  Section  2.1.) 

The  PLEST  system  estimates  a  design’s  layout  area.  This  system  is  based 
on  the  standard  cell  design  style.  PLEST  uses  two  estimators:  one  estimates  wiring 
space  requirements  for  the  routing  channels  and  the  other  estimates  the  number  of 
feedthroughs.  Even  though  this  is  an  estimator  that  does  not  produce  a  layout, 
PLEST  relies  heavily  on  placement  and  wire  routing  techniques.  PLEST  uses  a  1-D 
placement  model  and  folds  it  to  achieve  the  desired  number  of  rows.  PLEST  then 
makes  the  area  estimate  based  on  this  folded  placement  model.  Kurdahi  and  Parker 
claim  that  PLEST  is  a  fast  system  with  an  accuracy  within  10%  of  the  actual  areas. 
The  method,  speed,  and  resultant  accuracy  places  the  PLEST  system  somewhere  in 
the  middle,  away  from  both  extremes.  The  PLEST  system  has  accuracy  approaching 
that  of  actual  layouts,  but  it  is  fast  and  does  not  produce  a  layout. 
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The  ADAM  synthesis  system  predicts  system-level  area  and  delay  without 
producing  any  layout  information  at  all.  The  ADAM  system  uses  a  mathematical 
model  to  predict  an  area-delay  trade-off  curve  for  pipelined  and  nonpipelined  data 
paths.  This  trade-off  curve  determines  the  lower  bound  (theoretically  optimal)  for  a 
given  data  flow  graph  and  a  given  module  set.  Jain,  Parker,  and  Park  admit  their 
area  and  delay  models  are  high-level,  but  the  resultant  curve  serves  to  narrow  the 
search  space  and  arrive  at  an  acceptable  solution  quickly.  The  high-level  nature 
of  this  area  and  delay  predictor  places  the  ADAM  system  closer  to  the  high-level 
estimator  extreme. 

Each  of  these  systems  resides  in  a  different  place  in  the  area  and  delay  estimator 
spectrum.  The  optimal  location  in  this  spectrum  is  where  estimates  are  both  accurate 
(based  on  actual  layouts)  and  quickly  obtained  (no  layouts  produced).  This  thesis 
describes  a  method  of  quickly  obtaining  accurate  area  and  delay  estimates  with¬ 
out  producing  a  layout.  This  places  this  method  near  the  middle  of  the  estimator 
spectrum.  In  addition,  this  is  a  novel  method  because  it  uses  machine  learning  tech¬ 
niques  to  model  the  input-to-output  relationships  that  result  from  applying  layout 
tools,  rather  than  analyzing  the  layout  artifacts.  We  have  demonstrated  that  machine 
learning  techniques  make  it  possible  to  learn,  for  a  given  layout  tool,  what  physical 
and  graphical  (size,  topology,  connectivity,  etc.)  features  in  a  set  of  training  designs 
affect  the  relative  placement  of  nodes  in  the  resultant  layouts.  This  information  (rep¬ 
resented  in  the  form  of  production  rules)  becomes  a  model  that  can  be  used  in  place 
of  the  layout  tool  to  derive  area  and  delay  estimates.  When  a  designer  applies  this 
model  to  a  target  design,  the  rules  predict  the  relative  placement  of  nodes  based  on 
what  was  learned  from  the  training  sets.  The  estimator  uses  this  relative  placement 
information  to  estimate  area  and  critical  path  delay  for  the  target  design.  We  have 
shown  that  applying  the  model  and  obtaining  area  and  delay  estimates  takes  much 
less  time  than  producing  an  actual  layout.  In  addition,  the  area  and  delay  estimates 
are  accurate  because  they  are  based  on  information  learned  from  actual  layouts.  Ac¬ 
cording  to  a  paper  on  layout  estimation  that  Nourani  and  Papachristou  presented  at 
the  30th  ACM/IEEE  Design  Automation  Conference  in  1993,  a  difference  of  less 
than  10%  is  considered  very  good,  but  difficult  to  achieve.  A  difference  of  20-30%  is 
considered  by  the  community  to  be  both  acceptable  and  achievable.  Therefore, 
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as  a  challenge  to  this  research,  we  have  established  a  goal  of  no  more  than  30%  error 
in  the  area  and  delay  estimates  while  trying  to  get  most  estimates  within  a  10%  error. 

The  remainder  of  this  thesis  is  organized  as  follows.  Chapter  2  discusses  back¬ 
ground  information  on  area  and  delay  estimation  techniques  in  current  high-level  syn¬ 
thesis  systems,  layout  tools,  and  machine  learning.  Chapter  3  presents  an  overview  of 
the  tool  modeling  method  and  the  general  solution  architecture.  Chapter  4  describes 
the  solution  architecture  in  detail,  as  it  was  implemented  for  this  research.  Chapter  5 
describes  the  set  of  experiments  that  helped  to  develop  and  validate  this  method  and 
their  results.  Chapter  6  summarizes  this  thesis  and  discusses  the  contributions  from 
this  research,  parts  of  the  research  that  we  could  have  done  differently,  and  ends  by 
suggesting  future  research. 
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2.0  BACKGROUND 


The  area  and  delay  estimation  method  developed  by  this  research  uses  machine 
learning  techniques  to  model  the  input-to-output  relationships  that  result  from  using 
layout  tools.  Using  a  model  instead  of  an  actual  layout  tool,  VLSI  chip  designers  can 
obtain  layout  information  much  faster  than  producing  an  actual  layout.  The  designers 
can  also  quickly  obtain  accurate  area  and  delay  estimates  from  this  layout  information 
and  use  these  estimates  to  guide  design  decisions  in  high-level  synthesis.  Therefore, 
this  research  is  based  on  prior  work  in  the  fields  of  area  and  delay  estimation  in 
high-level  synthesis,  layout  tools,  and  machine  learning.  Section  2.1  discusses  prior 
work  in  high-level  synthesis  focusing  on  claims,  successes,  and  limitations  of  major 
thrusts  in  area  and  delay  estimation.  Section  2.2  discusses  specifics  on  layout  tools. 
Section  2.3  describes  machine  learning  in  general  and  discusses  prior  work  involving 
the  machine  learning  system  used  in  this  research. 

2.1  Area  and  Delay  Estimation  in  High-Level  Synthesis 

McFarland,  Parker,  and  Camposano  (1’  define  high-level  synthesis  as  taking  a 
description  of  a  design’s  behavior  along  with  cost  and  performance  constraints  and 
producing  a  structure  that  implements  the  behavior  while  satisfying  the  constraints. 
In  order  to  do  this,  a  high-level  synthesis  system  must  have  the  capability  to  evaluate 
the  cost  and  performance  of  designs  in  the  solution  space.  (For  this  research,  cost 
is  evaluated  in  terms  of  area  and  performance  is  evaluated  in  terms  of  critical  path 
delay.)  To  save  valuable  design  time  in  exploring  a  large  solution  space,  the  high- 
level  synthesis  system  uses  estimates.  This  section  discusses  several  area  and  delay 
estimators  and  how  they  can  be  used  in  high-level  synthesis  systems. 

The  Fasolt  system  is  a  data-path  optimizer.  It  uses  a  layout  estima¬ 

tor  to  produce  a  layout  model.  This  layout  estimator  is  a  compromise  between  a 
floorplanner  and  a  macro  cell  placement  tool.  According  to  Knapp,  “It  generates  a 
macrocell-oriented  placement;  performs  module  orientation,  aspect  ratio  assignment. 
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and  pin  assignment;  constructs  a  channel  graph;  and  estimates  channel  widths  by 
performing  loose  global  routing.”  Knapp  considered  using  a  slower,  high-quality  lay¬ 
out  tool,  but  decided  to  use  a  primitive  layout  estimator  instead.  He  did  this  to  save 
time  in  exploring  the  design  space. 

The  layout  estimator  can  use  either  a  min-cut  placement  or  simulated 

annealing  placement  strategy,  selected  by  the  user.  A  Steiner  tree 

approximation  is  used  for  multi-pin  nets  in  global  routing.  The  channel  widths  are 
estimated  by  summing  all  wires  in  the  channel.  This  is  based  on  a  pessimistic  as¬ 
sumption  that  all  wires  extend  along  the  entire  length  of  the  channel.  All  of  this 
information  makes  up  the  layout  model.  The  layout  model  provides  estimations  of 
low-level  physical  properties  so  Fasolt  can  evaluate  the  area  and  delay  of  a  particular 
data-path  optimization. 

However,  even  though  the  layout  estimator  does  not  produce  a  layout,  it  still 
takes  a  long  time  to  perform  placement  and  routing.  This  makes  the  data-path 
optimization  a  slow  process.  As  a  result,  Fasolt  is  limited  to  a  macrocell  layout  style 
so  the  layout  estimator  only  needs  to  handle  a  small  set  of  large  modules.  This 
means  this  type  of  approach  will  not  work  well  in  a  full  custom  gate  or  standard  cell 
environment.  In  addition,  Knapp  did  not  compare  results  from  the  layout  estimator 
to  results  from  a  real  layout  tool.  Knapp  did  state  this  would  be  part  of  planned 
future  research. 

In  the  ADAM  synthesis  system,  a  prediction  tool  that  focuses  on  the  sums 
of  areas  and  the  sums  of  delays  for  individual  components  is  used  to  predict  a  lower 
bound  area-delay  trade  off  curve.  A  lower  bound  curve  means  that  all  design  points 
must  lie  on  or  above  the  curve.  All  design  points  on  the  curve  represent  optimal 
designs.  Jain,  Parker,  and  Park  admit  this  is  a  high-level  predictor  and  it  only 
considers  whole  components  when  predicting  the  total  area  and  delay.  While  it 
serves  to  narrow  the  search  space  and  help  the  system  to  arrive  at  an  acceptable 
solution,  it  does  not  come  close  to  the  accuracy  or  design  quality  obtained  from  a 
full-custom  layout.  In  addition,  while  it  considers  functional  units  and  controllers 
in  the  area  and  delay  predictions,  it  does  not  consider  registers,  multiplexers,  buses, 
or  wiring.  Also,  because  this  system  only  produces  a  lower  bound  curve,  the  only 
way  to  compare  it  to  other  layout  tools  is  to  plot  a  similar  curve.  While  Jain  et.  al. 
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plotted  curves  for  other  layout  tools  and  compared  them  to  curves  from  the  ADAM 
system,  no  percentage  differences  were  calculated. 

The  PLEST  system  only  estimates  the  area  requirements  of  VLSI  designs. 
This  system  is  based  on  standard  cells.  PLEST  uses  two  estimators:  one  estimates 
wiring  space  requirements  for  the  routing  channels  and  the  other  estimates  the  num¬ 
ber  of  feedthroughs.  PLEST  uses  a  1-D  placement  model  and  folds  it  to 

achieve  the  desired  number  of  rows.  A  1-D  placement  is  easier  to  model  than  a  2-D 
placement  and  Kurdahi  and  Parker  claim  this  allows  them  to  analyze  the  design  bet¬ 
ter  and  produce  confident  area  estimates.  Kurdahi  and  Parker  also  claim  PLEST  is 
a  fast  system  with  an  accuracy  within  10%  of  the  actual  areas. 

In  the  TELE  2.0  system  Ramachandran  and  Kurdahi  use  a  combination 

of  constructive  (partial  slicing  tree)  and  analytical  (Rent’s  rule  approaches  to 
estimate  wire-length.  A  slicing  tree  is  built  from  the  netlist  information.  However, 
the  design  is  sliced  only  to  a  small  number  of  levels.  They  call  the  leaves  of  the  tree 
the  leaf  clusters.  Analytical  area  and  shape  function  estimators  are  applied 

to  the  leaf  clusters  to  obtain  area  estimates.  By  traversing  the  tree  from  the  leaf 
clusters  to  the  root,  TELE  2.0  can  find  the  shape  function  for  the  root  node.  Once 
this  is  done,  TELE  2.0  can  approximate  the  location  of  the  leaf  clusters.  This  location 
information  determines  the  layout  net  lengths  between  leaf  clusters.  The  length  of  the 
wires  inside  the  leaf  clusters  is  estimated  by  using  Rent’s  rule.  TELE  2.0  uses  these 
wire-length  estimates  along  with  individual  cell  delays  to  provide  a  delay  time  for  the 
worst  case  delay  path  in  the  design  based  on  the  circuit  topology.  Ramachandran  and 
Kurdahi  claim  TELE  2.0  is  a  fast  and  reasonably  accurate  tool  (7%  or  better),  but 
only  for  small  designs  (less  than  1800  cells).  TELE  2.0  is  also  limited  to  a  standard 
cell  environment. 

Ramachandran  et.  al.  also  have  a  tool  that  models  layout  area  and  delay 

for  structural  register-transfer  level  designs.  It  uses  constructive  techniques  to  incor¬ 
porate  floorplanning  information  into  the  area  estimation  model.  To  its  advantage, 
this  system  models  the  required  area  for  the  data-path,  controller,  macrocells,  and 
memories.  It  can  also  directly  estimate  the  total  chip  area.  This  system  calculates 
delay  in  three  parts:  intrinsic  cell  delay,  fanout  delay,  and  wire  length  delay.  The 
intrinsic  cell  delay  and  fanout  delay  comes  from  a  cell  library  and  the  netlist.  The 
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wire  delay  is  based  on  constant  width  wires  and  a  lumped  RC  model  where 

the  capacitance  and  the  resistance  are  directly  proportional  to  the  length  of  the  wire. 
The  approximate  layout  topology  generated  by  the  area  model  determines  the  wire 
length  estimate. 

While  the  authors  report  worst  case  errors  of  13%  for  area  estimates  and  10% 
for  delay  estimates,  they  state  that  they  cannot  make  a  general  statement  about 
the  efficiency  of  this  system  because  it  has  only  been  tested  with  a  limited  range  of 
designs  containing  less  than  2500  cells. 

Nourani  and  Papachristou  developed  an  algorithm  that  estimates  layouts 
for  register-transfer  level  datapaths  using  standard  cells.  It  uses  a  non-probabilistic 
analytical  formula  in  a  constructive  algorithm.  The  estimation  algorithm  executes  in 
two  phases.  Phase  1  places  all  cells  in  one  row  and  Phase  2  folds  them  into  the  user- 
specified  aspect  ratio.  These  are  the  classical  placement  phases  for  estimators,  but 
Nourani  and  Papachristou  claim  theirs  is  different.  They  do  not  use  any  probabilistic 
methods  to  estimate  wire  lengths.  Instead  they  use  information  about  the  input  and 
outputs  ports  of  the  cells  from  the  cell  library.  They  also  use  a  flexible  layout  model 
for  the  components  and  do  not  need  to  consider  feedthroughs.  This  requires  less 
computations  and  saves  time.  Another  difference  is  that  they  can  estimate  layouts 
based  on  either  area  minimization  or  delay  minimization,  while  using  the  same  model. 

Nourani  and  Papachristou  claim  their  area  estimates  are  within  12%  of  actual 
layouts,  hut  they  can  only  show  this  for  small  designs  (less  than  2100  cells).  They 
also  claim  that  it  can  be  extended  to  handle  full-custom  cells,  hut  they  must  define 
these  cells  in  terms  of  standard  cells. 

None  of  the  systems  described  above  considered  modeling  the  relationships  that 
result  from  using  layout  tools  instead  of  analyzing  the  target  designs  or  estimating 
layouts  to  gain  a  time  and  resources  advantage.  Our  research  shows  that  modeling 
these  relationships  and  using  the  model  in  place  of  the  layout  tool  is  a  reasonable 
and  efficient  alternative  in  estimating  the  area  and  delay  of  target  designs. 


9 


2.2  Layout  Tools 

Layout  tools  perform  the  last  step  in  the  VLSI  physical  design  process 
Layout  tools  transform  a  low  level  circuit  description  into  a  mask  that  chip  manu¬ 
facturers  use  to  guide  the  chip  fabrication  process.  The  circuit  description  may  be  a 
net  list  of  primitive  gates  or  standard  cells  or  even  higher  level  macro  cells.  The  layout 
tool  analyzes  this  netlist  and  performs  two  tasks:  placement  and  routing.  The  layout 
tool  places  the  gates  or  cells  in  such  a  manner  that  some  cost  function  is  satisfied. 
This  cost  function  could  be  a  function  of  the  total  area  of  the  layout  or 

a  function  of  the  critical  path  delay  (^7,  31,  32)  gates  or  cells  are  placed, 

the  layout  tool  routes  the  appropriate  wires  between  the  gates  or  cells.  This  routing 
is  also  performed  to  satisfy  some  cost  function. 

However,  many  types  of  layout  problems  such  as  global  routing  and 

minimizing  the  channel  width  in  Manhattan  routing  are  known  to  be  NP- 

complete  This  makes  it  very  difficult  to  find  the  exact  optimal  solution.  There¬ 
fore,  we  require  heuristic  techniques  to  search  the  solution  space  in  an  attempt  to 
find  an  acceptable  solution.  In  some  cases,  the  solution  space  is  quite  large  and  even 
with  a  heuristic  search,  it  takes  a  lot  of  computational  time  and  effort  to  find  an 
acceptable  solution. 

2.2.1  Simulated  Annealing 

One  such  heuristic  technique  is  simulated  annealing  This  technique 

incorporates  a  probabilistic  hill-climbing  search  method  The  simulated  an¬ 

nealing  algorithm  is  shown  in  Figure  1.  The  first  step  is  to  build  a  random  initial 
configuration  (in  this  case,  an  initial  layout).  This  becomes  the  current  configuration. 
A  new  configuration  is  built  by  randomly  permuting  the  current  configuration.  The 
new  configuration  is  evaluated  to  see  if  it  is  the  best  at  satisfying  the  cost  function 
so  far.  If  it  is  the  best,  it  is  saved.  Then  the  algorithm  determines  if  the  new  config¬ 
uration  should  replace  the  current  configuration.  It  does  this  through  an  acceptance 
function.  The  acceptance  function  algorithm  is  shown  in  Figure  2.  The  decision 
to  accept  the  new  configuration  is  based  on  the  cost  of  the  current  configuration, 
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the  cost  of  the  new  configuration,  and  the  current  temperature.  At  first,  the  tem¬ 
perature  is  a  large  value.  When  the  temperature  is  high,  the  new  configuration  is 
more  likely  to  be  accepted,  even  if  it  has  a  worse  cost  than  the  current  configuration 
(hill-climbing).  As  the  simulated  annealing  process  continues,  the  temperature  is 
cooled  down  according  to  a  cooling  schedule  (the  update  function  line  in  Figure  1). 
As  the  temperature  cools,  the  likelihood  of  accepting  new  configurations  decreases. 
Therefore,  the  algorithm  starts  to  behave  like  a  greedy  algorithm  in  the  final  stages. 

Simulated-Annealing (TO) 

■C 

Build  initial  configuration  cur_conf; 

T  =  TO; 

sav_conf  =  cur_conf ; 

while  (stopping  criterion  is  not  satisfied)  { 
while  (inner-loop  criterion  is  not  satisfied)  •{ 
new.conf  =  permute (cur_conf ) ; 
if  (cost(new_conf )  <  cost(sav_conf ) )  { 
sav_conf  =  new_conf; 

} 

if  (Accept ( cost (cur_conf ) ,  cost(new_conf ) ,  T)  ==  TRUE)  { 
cur.conf  =  new_conf; 

> 

} 

T  =  update (T); 

} 

}  /*  end  Simulated-Annealing  */ 

Figure  1  Simulated  Annealing  Algorithm 

While  this  is  a  simple  straightforward  algorithm,  the  number  of  permutations 
required  to  produce  quality  results  grows  with  the  size  of  the  layout.  In  addition, 
evaluating  the  cost  function  for  large  layouts  can  be  very  time-consuming.  While  they 
can  produce  quality  results,  layout  tools  based  on  simulated  annealing  take  a  long 
time.  For  example,  Artistll  uses  a  variant  of  simulated  annealing.  As 

described  in  Section  3. 3. 2.1,  Artistll  required  a  couple  of  weeks  of  CPU  run  time  to 
process  a  design  with  only  about  3000  gates.  This  time  requirement  makes  it  difficult 
to  use  area  and  delay  values  based  on  actual  layouts  in  high  level  synthesis.  However, 
modeling  the  relationships  resulting  from  using  the  layout  tool  and  using  this  model 


11 


Accept (cur_cost,  new_cost,  T) 

{ 

delta.cost  =  cur.cost  -  new_cost; 
y  =  exp(-delta_cost/T) ; 
r  =  random(0,l); 
if  (r  <  y)  { 
return  TRUE; 

}  else  {. 

return  FALSE; 

} 

}  /*  end  Accept  */ 


Figure  2  Accept  Function  Algorithm 

in  place  of  the  tool  to  produce  layout  information  saves  time.  The  next  two  sections, 
Section  2.2.2  and  Section  2.2.3,  describe  the  two  target  layout  tools  we  used  in  this 
research. 

2.2.2  Artistll 

Artistll  is  a  second-generation  CMOS  layout  tool  developed  at  the  Pennsylvania 
State  University  A  detailed  descript  ion  of  the  operation  of  Artistll 

is  given  in  the  appendix.  This  layout  tool  uses  a  variant  of  simulated  annealing  to 
automatically  map  a  given  design  into  a  layout  while  optimizing  the  layout  area. 
Artistll  analyzes  a  transistor-level  net-list  description  of  the  target  design  op¬ 
tional  gate-clustering  information  and  a  set  of  user  specified  parameters 

including  the  number  of  trial  placements  to  evaluate  and  the  number  of  rows  to  use 
in  the  layout.  The  output  is  an  object-based  description  of  the  smallest  layout  found 
during  the  user  specified  number  of  trial  placements.  Artistll  also  includes  a  transla¬ 
tor  that  translates  this  object-based  description  into  a  Magic  format  layout 

file  and  a  grouping  description  file.  The  grouping  file  indicates  the  row  and  column 
location  of  each  primitive  gate  in  the  layout. 

Artistll  was  developed  for  use  on  small  to  medium  size  designs  using  no  more 
than  a  few  thousand  primitive  gates.  It  works  best  on  small  designs  (fewer  than 
100  gates),  but  it  is  capable  of  handling  larger  designs  as  well.  The  larger  designs 
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require  longer  running  times  for  good  quality  results.  Table  1  shows  the  run  times 
(in  CPU  seconds)  for  a  set  of  designs  used  in  this  research.  These  run  times  were 
obtained  by  running  Artistll  on  a  Sun  SPARCstation  2  with  593  Megabytes  of  swap 
space.  The  first  column  lists  the  names  of  these  designs.  The  names  also  indicate 
the  function  of  the  design  (i.e.,  multl6  indicates  a  16-bit  multiplier  design).  The 
second  column  shows  the  number  of  primitive  gates  in  the  design.  The  third  column 
shows  the  number  of  transistors.  The  last  column  shows  the  required  run  time  for 
Artistll.  For  example,  the  design  multS  has  711  gates  made  up  of  2574  transistors.  It 
took  Artistll  47696  CPU  seconds  (a  little  over  13  CPU  hours)  to  produce  a  layout. 
All  of  the  run  times,  except  for  the  last  three  designs,  are  based  on  using  10,000 
iterations.  (Artistll  considers  an  evaluation  of  one  trial  placement  as  an  iteration. 
Therefore,  one  iteration  would  equate  to  one  pass  through  the  outer  loop  in  the 
simulated  annealing  algorithm  shown  in  Figure  1.)  The  last  three  designs,  multlO, 
big,  and  realbig  use  only  100  iterations.  This  is  because  the  run  time  of  Artistll  using 
10,000  iterations  for  these  designs  made  it  impossible  to  obtain  a  layout  in  the  local 
computing  environment. 

This  table  shows  that  the  run  time  for  Artistll  can  become  prohibitively  long,  even 
for  medium  size  designs  having  only  several  hundred  gates.  It  would  be  impractical  to 
expect  a  designer  to  wait  this  long  for  results,  especially  if  there  are  several  designs  in 
the  solution  space  requiring  evaluation.  It  would  be  better  to  use  a  model  of  Artistll 
to  quickly  obtain  layout  information  for  the  designs  in  the  design  space,  and  then  use 
Artistll  to  produce  a  quality  layout  for  the  selected  design. 

2.2.3  TimberWolf 

The  TimberWolf  layout  tool  was  developed  as  a  timing  driven  placement  and 
global  routing  package  It  is  capable  of  handling  any  of  the  row  based 

designs  including  standard  cells,  gate  arrays,  and  sea-of-gates  designs.  TimberWolf  is 
also  applicable  to  floorplanning  problems  and  may  be  used  to  completely  place  and 
globally  route  mixed  macro/standard  cell  designs. 

Placement  of  standard  cells  and  global  routing  in  TimberWolf  takes  place  over 
three  distinct  stages.  In  the  first  stage,  TimberWolf  uses  simulated  annealing  to  place 
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Table  1  Execution  Times  for  Artistll 


Design 

Name 

Number 
of  Gates 

Number  of 
Transistors 

Execution 
Time  (CPU  sec) 

add4 

43 

146 

840 

adds 

95 

330 

2278 

add  12 

147 

514 

5385 

addl6 

199 

698 

8278 

add24 

303 

1066 

11886 

add32 

407 

1434 

18735 

absl6 

208 

638 

6676 

inc4 

23 

68 

405 

incS 

51 

156 

980 

incl6 

107 

332 

3408 

sub4 

48 

160 

828 

subs 

104 

352 

2374 

sub  16 

216 

736 

7788 

addsub4 

64 

224 

1398 

addsubS 

132 

464 

4896 

addsublb 

268 

944 

10224 

mult  2 

40 

112 

653 

mult4 

163 

550 

4106 

mult  6 

389 

1378 

25213 

mult  8 

711 

2574 

47696 

mult  16 

2959 

11038 

28746 

big 

6264 

23388 

65493 

realbig 

12528 

46776 

359289 

cells  while  minimizing  the  total  interconnect  cost.  In  the  second  stage,  TimberWolf 
inserts  feed-through  cells  as  required  while  continuing  to  minimize  total  interconnect 
cost  through  simulated  annealing.  TimberWolf  also  uses  uncommitted  feed-throughs 
from  the  standard  cells  in  place  of  additional  feed-throughs  whenever  possible.  In 
the  third  stage,  TimberWolf  makes  local  changes  to  the  placement  if  these  changes 
lead  to  a  reduction  in  the  number  of  wiring  tracks. 

Sechen  claims  that,  when  compared  to  other  automatic  and  manual  layout 

methods,  TimberWolf  yielded  an  area  savings  from  15  to  75  percent.  He  also  claims 
the  global  router  reduced  the  number  of  wiring  tracks  by  an  additional  7  to  16  percent 
when  compared  to  the  total  interconnect  length  minimization  alone.  In  addition,  he 
claims  the  third  stage  (local  placement  refinement)  reduced  the  number  of  wiring 
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tracks  another  8  percent. 

Now  that  we  have  described  what  we  are  trying  to  model,  the  next  section  de¬ 
scribes  various  machine  learning  systems  that  we  could  use.  The  next  section  first 
describes  what  machine  learning  is  all  about  and  then  describes  three  different  im¬ 
plementations  of  machine  learning.  The  section  finishes  by  describing  the  machine 
learning  system  we  used  in  this  research. 

2.3  Machine  Learning 

A  machine  learning  system  is  a  computer  program  that  makes  decisions  based 
on  the  accumulated  experience  contained  in  successfully  solved  cases 
This  experience  is  represented  as  a  set  of  patterns  of  data  with  correct  classifications 
(based  on  the  solution)  for  each  data  pattern.  Thus,  learning  becomes  a  classification 
problem.  The  learning  system  decides  which  classification  to  make  for  a  given  data 
pattern.  This  classification  problem  is  called  a  prediction  problem  in  statistics  and 
concept  learning  in  machine  learning.  Each  data  pattern  has  a  particular  classification 
or  concept  associated  with  it.  The  task  of  the  learning  system  is  to  learn  what  data 
patterns  produce  which  concepts. 

The  data  patterns  represent  observations  about  certain  features  in  the  target 
domain.  Each  feature  is  described  by  an  attribute  (color,  size,  shape,  etc.),  which 
can  take  on  a  value  (red,  big,  square,  etc.).  Of  course,  each  feature  has  a  range  of 
admissible  values.  These  values  may  be  discrete,  as  in  color,  or  continuous,  as  in  size. 
Each  set  of  feature  values  and  their  associated  concept  is  called  a  training  instance. 
A  complete  set  of  training  instances  (accumulated  experience)  is  called  a  training  set. 
All  training  instances  in  the  training  set  must  have  the  same  set  of  features,  but  not 
necessarily  the  same  set  of  values. 

The  learning  system  analyzes  the  training  set  and  produces  a  classifier.  This  clas¬ 
sifier  makes  classification  decisions  based  on  the  learning  system’s  acquired  knowledge 
of  the  solved  cases.  The  classifier  takes  a  set  of  feature  values  as  input  and  produces 
a  concept  as  output. 
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Producing  a  classifier  would  be  an  easy  task  if  the  learning  system  had  an  unlim¬ 
ited  number  of  training  instances  from  which  to  learn  and  if  the  domain  knowledge 
contained  in  these  training  instances  was  complete.  However,  in  the  real  world  this  is 
never  the  case.  The  learning  system  has  only  a  limited  number  of  training  instances 
and  the  domain  knowledge  is  lacking  in  some  respect.  Therefore,  there  are  two  basic 
problems  in  machine  learning.  One  is  trying  to  generalize  decisions  based  on  only  a 
sample  of  specific  training  instances.  The  other  is  trying  to  make  the  correct  decisions 
based  on  incomplete  knowledge. 

If  the  training  set  could  include  all  possible  training  instances  with  the  correct 
concepts,  the  classifier  could  be  implemented  as  a  simple  lookup  table.  The  classifi¬ 
cation  process  would  consist  of  matching  the  input  set  of  features  with  the  same  set 
from  the  lookup  table  and  using  the  associated  concept  as  the  output.  This  would 
always  produce  correct  results.  However,  unless  the  number  of  features  and  concepts 
for  a  particular  domain  is  very  small,  this  type  of  pattern  matching  becomes  a  diffi¬ 
cult  problem.  It  becomes  impossible  if  any  of  the  values  are  continuous.  Therefore, 
because  training  sets  can  reflect  only  a  sample  of  all  possible  training  instances,  the 
problem  becomes  one  of  trying  to  generalize  decisions  based  on  a  limited  number  of 
specific  training  instances.  This  makes  the  classifier  more  complicated  and  introduces 
errors. 

The  error  rate  of  a  learning  system  is  the  most  commonly  used  measure  of  success 
or  failure  When  the  classifier  tries  to  classify  an  unknown  case,  it  will  either 

make  the  correct  choice  or  it  will  make  an  incorrect  choice.  The  error  rate  is  defined 
as  the  ratio  of  the  number  of  incorrect  choices  to  the  total  number  of  unknown  cases. 
A  true  error  rate  could  be  calculated  if  the  there  were  an  unlimited  number  of  cases, 
all  with  the  correct  concept  already  known.  However,  this  is  not  practical,  so  an 
apparent  error  rate  is  derived.  The  apparent  error  rate  of  a  classifier  is  the  error 
rate  of  the  classifier  on  the  training  set  This  makes  sense  because  as  the 

number  of  instances  in  the  training  set  approaches  infinity,  the  apparent  error  rate 
will  eventually  become  the  true  error  rate.  In  this  research,  we  determine  the  error 
rate  of  the  learning  system  in  a  slightly  different  way.  Instead  of  measuring  how  a 
classifier  performs  on  the  training  set,  we  measure  the  effects  of  its  performance  on 
real  cases.  That  is,  we  measure  how  well  our  area  and  delay  estimator  performs. 
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given  predicted  layout  information  from  the  learning  system.  This  is  explained  in 
more  detail  in  Section  5.2. 

Another  problem  arises  when  the  acquired  knowledge  about  a  domain  is  incom¬ 
plete  and  the  classifier  cannot  make  a  decision  with  certainty  For  example, 

the  same  set  of  feature  values  may  show  up  in  different  training  instances  with  dif¬ 
ferent  observed  concepts.  In  this  case,  the  classifier  must  determine  which  concept 
is  most  likely  to  occur  for  that  particular  set  of  feature  values  when  encountered 
in  the  unclassified  cases.  This  introduces  the  need  for  expressing  confidence  in  the 
classifier’s  decisions.  This  problem  is  solved  by  using  certainty  factors. 

Certainty  factors  combine  degrees  of  belief  and  disbelief  about  a  decision’s  truth 
into  a  single  number  This  number  allows  comparisons  of  competing 

decisions.  When  the  learning  system  produces  a  decision,  it  assigns  the  decision 
a  certainty  factor  based  on  acquired  knowledge  in  the  training  set.  Then,  in  the 
classifier,  if  different  decisions  produce  different  concepts  for  the  same  set  of  feature 
values,  certainty  factors  help  to  select  a  concept.  For  instance,  if  a  decision  indicating 
one  concept  for  a  given  data  pattern  has  a  high  certainty  factor  and  another  decision 
indicating  another  concept  for  the  same  data  pattern  has  a  low  certainty  factor,  only 
the  decision  with  the  high  certainty  factor  is  used.  This  is  known  as  the  single  best 
decision  scheme  The  decision  with  the  single  best  certainty  factor  is  used. 

Certainty  factors  may  also  be  used  in  a  weighted  voting  scheme  In  this 

scheme,  the  certainty  factors  of  all  decisions  indicating  the  same  concept  for  a  par¬ 
ticular  data  pattern  are  combined  into  a  single  certainty  factor.  This  is  done  for 
all  decisions  bearing  on  that  data  pattern.  The  concept  with  the  highest  combined 
certainty  factor  is  selected  as  the  concept  for  the  data  pattern.  (This  is  the  method 
we  use  in  this  research.  This  is  explained  in  more  detail  in  Section  3. 3. 2. 2.) 

Now  that  we  have  presented  a  general  overview  of  machine  learning,  we  can  look 
at  specific  implementations.  There  are  three  major  techniques  for  implementing 
machine  learning  classifiers.  These  techniques  are  linear  discriminants,  neural  nets, 
and  production  rules.  The  following  sections  describe  these  techniques. 
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2.3.1  Linear  Discriminants 

Linear  discriminants  are  the  most  common  form  of  classifier  In  these 

systems,  a  linear  combination  of  the  feature  values  discriminates  the  classes.  If  a 
domain  has  d  features,  then  a  d— 1  dimensional  hyperplane  separates  the  classes.  The 
general  form  is  given  in  Equation  2-1,  where  (ci,  e^,  ...ed)  is  the  set  of  feature  values, 
d  is  the  number  of  features,  and  Wi  are  constant  weights  learned  from  the  acquired 
knowledge  (training  set)  for  the  domain.  Equation  2-1  implements  a  weighted  sum 
of  the  feature  values. 


sum  =  wiCi  +  W2e2...  +  WdCd  —  Wo  (2-1) 

The  learning  system  produces  a  unique  set  of  weights  for  each  class.  The  clas¬ 
sifier  contains  these  sets  of  weights  and  computes  this  sum  for  each  class  on  every 
unknown  case  presented.  The  class  whose  weights  produce  the  highest  sum  becomes 
the  selected  class  for  that  case. 

The  kinds  of  problems  best  solved  by  linear  discriminants  are  ones  in  which  the 
classes  are  normally  distributed  A  linear  discriminant  based  learning  system 

was  used  in  Samuel’s  checkers  program  Basically,  this  program  learned  to  play 
checkers  by  playing  several  games.  When  good  moves  occurred,  features  that  were 
heavily  weighted  in  the  classifier  got  additional  weight.  When  bad  moves  occurred, 
they  lost  some  of  their  weight.  The  classes  describing  the  different  moves  were  as¬ 
sumed  to  be  normally  distributed.  This  was  a  simple  problem  and  only  required 
about  a  day  of  computation  time.  (Of  course,  this  computation  time  is  measured  in 
1959  time  when  computers  were  much  slower.) 

2.3.2  Neural  Nets 

Neural  nets  are  another  type  of  classifier.  A  neural  net  is  a  parallel  distributed  in¬ 
formation  processing  structure  in  the  form  of  a  directed  graph  The  nodes 

of  the  graph  are  the  processing  elements.  The  edges  of  the  graph  are  unidirectional 
connections  between  processing  elements.  A  processing  element  can  have  any  num¬ 
ber  of  input  connections.  A  processing  element  can  also  have  any  number  of  output 
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connections,  but  each  output  must  have  the  same  value.  (The  processing  element 
really  has  only  one  output.  This  output  is  fanned  out  as  needed.)  Each  processing 
element  has  local  memory  and  possesses  a  transfer  function.  This  transfer  function 
uses  information  from  the  local  memory  and  the  inputs  to  produce  new  information 
for  the  local  memory  and  the  output.  Therefore,  the  transfer  function  implements 
the  learning  process.  Learning  takes  place  in  a  neural  net  as  the  local  memory  is 
modified  each  time  a  new  training  instance  is  presented.  The  training  instances  are 
presented  sequentially  and  the  order  does  not  matter. 

The  classic  implementation  of  a  processing  element  is  a  device  called  a  perceptron 
(50,  52,  56,  57)  pigm-e  3  shows  a  typical  perceptron.  A  perceptron  decides  if  an  input 
pattern  belongs  to  one  of  two  classes.  In  this  regard,  a  perceptron  is  the  equivalent 
of  a  linear  discriminant.  Just  as  in  Equation  2-1,  a  perceptron  has  a  fixed  number 
of  inputs  that  correspond  to  the  number  of  features  in  the  domain.  The  inputs  are 
weighted  and  summed  to  produce  an  output.  The  weights  are  stored  in  local  memory 
and  the  transfer  function  performs  the  sum.  If  this  sum  is  greater  than  0,  the  output 
of  the  perceptron  is  assigned  a  value  of  1,  not  the  summed  value.  If  the  sum  is  less 
than  or  equal  to  0,  the  output  of  the  perceptron  is  assigned  a  value  of  0.  The  output 
of  a  perceptron  indicates  to  which  class  the  set  of  inputs  belongs.  If  the  output  of 
the  perceptron  is  1,  the  input  pattern  belongs  to  class  1.  If  the  output  is  0,  the  input 
pattern  belongs  to  class  0. 


Xl  x2 


Figure  3  A  Perceptron 
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During  the  learning  process,  there  is  another  input  to  the  perceptron.  This  input 
indicates  the  correct  output  value  for  the  input  pattern,  based  on  the  observed  cases 
in  the  training  set.  In  the  learning  mode,  the  transfer  function  in  the  perceptron 
computes  the  weighted  sum  of  the  input  pattern  and  compares  the  resultant  output 
to  the  correct  output  value.  If  the  perceptron’s  output  is  the  same  as  the  correct 
output,  the  weights  are  not  changed.  If  the  perceptron’s  output  and  the  correct 
output  are  different,  the  transfer  function  adjusts  the  weights  and  stores  them  back 
in  local  memory.  Thus,  the  next  time  the  perceptron  is  presented  with  the  same 
input  pattern,  it  will  produce  an  output  closer  to  the  correct  one.  The  inventor  of 
the  perceptron,  Frank  Rosenblatt  proved  that  given  linearly  separable  classes, 
a  finite  number  of  training  instances  will  develop  a  set  of  weights  that  will  always 
produce  the  correct  outputs.  This  is  known  as  the  perceptron  convergence  theorem. 
Convergence  to  the  correct  set  of  weights  will  eventually  occur  even  if  the  initial  set 
of  weights  are  randomly  chosen.  However,  it  is  not  known  a  priori  how  many  training 
instances  are  needed  to  converge  or  how  long  it  will  take  to  converge. 

Neural  nets,  unlike  linear  discriminants,  make  no  assumption  about  the  popula¬ 
tion  distribution.  The  classes  do  not  have  to  be  normally  distributed  to  get  good 
results.  Neural  nets  are  well  suited  for  such  problems  as  sensor  processing,  control 
applications,  and  data  analysis  For  example,  one  sensor  processing  applica¬ 

tion  is  character  recognition.  The  user  forms  a  character  on  a  digitizer  tablet  and 
the  neural  net  recognizes  the  character  and  uses  it  as  appropriate.  Of  course,  the 
training  set  would  consist  of  many  such  characters. 

One  drawback  to  both  linear  discriminants  and  neural  nets  is  the  decision  rules 
are  hidden.  In  both  systems,  the  weights  determine  the  classification  for  a  given  set 
of  feature- values.  These  weights  are  based  on  what  was  learned  from  the  training 
set.  But  these  weights  do  not  directly  indicate  what  relationships  exist  between  the 
features  and  the  concepts  according  to  accumulated  experience  in  the  training  set. 
There  is  no  way  to  directly  check  the  logical  correctness  of  the  decisions  based  on  the 
weights.  Therefore,  the  classifications  from  linear  discriminants  and  neural  nets  must 
be  accepted  on  faith  if  there  is  no  known  correct  answer  to  compare  them  against. 
The  decision  rules  could  be  inferred  by  an  involved  mathematical  computation,  given 
the  user  is  so  inclined  and  has  a  mathematics  background. 
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A  better  solution  is  to  implement  a  learning  system  that  produces  rules  directly, 
and  makes  the  rules  compatible  with  human  reasoning  in  a  human  readable  format. 
This  way,  the  user  can  easily  check  the  rules  to  determine  if  the  rules  make  sense  for 
the  domain.  If  the  learning  system  produces  rules  that  do  not  make  sense  for  the 
domain  at  first  glance,  the  rules  can  be  investigated  further.  This  investigation  will 
either  show  that  the  rules  are  not  correct,  or  it  will  show  that  the  rules  have  uncovered 
something  not  well  known  about  the  domain.  In  the  first  case,  we  can  disregard  the 
rules  and  either  adjust  the  learning  process  or  modify  the  training  set  to  produce 
rules  that  do  make  sense.  In  the  second  case,  we  can  add  the  new  information  to  our 
accumulated  knowledge  of  the  domain  and  use  the  rules  with  confidence. 

Due  to  the  novel  use  of  machine  learning  techniques  in  the  VLSI  CAD  domain 
in  this  research,  we  found  it  necessary  to  check  the  rules  directly.  Therefore,  we 
used  a  machine  learning  system  that  produces  human  readable  rules.  Section  2.3.3 
discusses  rule-based  learning  systems  in  general  and  Section  2.3.3. 1  discusses  the 
learning  system  used  in  this  research. 

2.3.3  Rule-Based  Machine  Learning 

In  rule-based  learning  systems,  decision  rules  have  the  general  form  of  an  “IF- 
THEN”  statement  IF  X  is  true,  THEN  classify  as  class  A.  The  simplest 

form  of  the  “IF”  part  is  a  conjunction  (AND)  of  propositions:  IF  X  is  true  AND 
Y  is  false,  THEN  classify  as  class  B.  However,  more  than  one  rule  in  conjunctive 
form  may  be  required  to  properly  handle  a  class.  The  required  rules  may  be  put 
together  in  a  disjunctive  (OR)  set.  This  format  for  combining  rules  into  sets  is  called 
the  disjunctive  normal  form.  Learning  systems  that  produce  rules  in  this  form  make 
it  easy  for  humans  to  understand  the  rules  and  to  have  confidence  in  the  learning 
process. 

There  are  two  basic  techniques  for  implementing  rule-based  learning  systems: 
decision  trees  and  production  rules  Decision  trees  are  directed  graphs 

where  each  node  represents  a  single  decision  and  the  edges  indicate  where  to  go  based 
on  the  decision.  Leaf  nodes  indicate  a  class  decision.  All  paths  in  a  decision  tree  end 
up  at  a  leaf  node.  The  leaf  node  indicates  which  class  is  selected  corresponding  to  a 
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conjunction  of  decisions  made  along  the  path.  If  there  are  multiple  paths  for  a  given 
class,  the  paths  represent  disjunctions.  In  addition,  all  paths  in  a  decision  tree  are 
mutually  exclusive.  One  and  only  one  path  will  eventually  lead  to  a  classification 
for  a  given  set  of  feature- values.  This  also  means  that  only  one  classification  can  be 
selected  for  a  given  set  of  feature- values.  Therefore,  there  is  no  need  for  certainty 
factors  or  rule  weighting  schemes. 

The  goal  of  a  decision  tree  based  learning  system  is  to  learn  the  structure  of  the 
tree.  This  process  is  known  as  rule  induction.  The  tree  is  induced  from  the  training 
set.  A  starting  feature  is  selected  for  the  root  node.  The  root  node  is  split  into 
disjoint  sets  according  to  distinct  values  for  the  feature.  This  process  is  repeated  for 
subsequent  nodes  and  features.  Leaf  nodes  occur  when  all  members  of  the  sample 
belong  to  the  same  class  so  splitting  is  no  longer  necessary.  This  class  becomes  the 
selected  class  for  the  path  that  lead  to  the  leaf  node. 

Using  production  rule  techniques  gives  the  learning  system  more  power.  This  is 
because  production  rules  do  not  have  to  be  mutually  exclusive  like  paths  in  a  decision 
tree.  For  any  given  input  case,  more  than  one  rule  may  fire  for  the  same  class,  or 
two  or  more  rules  may  fire  for  different  classes.  (A  rule  is  said  to  “fire”  when  the 
IF  part  is  satisfied  by  the  input  data  pattern.)  If  two  or  more  rules  fire  indicating 
different  classes,  we  need  an  ordering  scheme  or  certainty  factors  to  help  select  the 
most  appropriate  class. 

Building  a  learning  system  using  production  rules  is  quite  complex.  This  is  be¬ 
cause  it  is  difficult  to  predict  the  interaction  of  rules  that  are  not  mutually  exclusive, 
even  on  the  training  set.  However,  if  an  exhaustive  search  of  all  possible  rule  sets 
in  disjunctive  normal  form  is  implemented,  the  power  of  the  production  rule  system 
becomes  evident.  All  possible  rules  would  be  generated  from  the  most  general  to  the 
most  specific.  These  rules  would  cover  all  of  the  possible  relationships  from  features 
to  concepts.  Of  course  for  nontrivial  domains,  an  exhaustive  search  is  impractical, 
so  we  need  to  use  heuristic  search  techniques.  One  heuristic  technique  is  to  use  a 
beam  search  In  an  exhaustive  search  logical  expressions  are 

generated  and  discarded  if  they  are  no  better  than  the  best  expression  so  far.  All 
possible  expressions  are  generated  and  compared  and  the  best  one  is  kept.  In  a  beam 
search,  the  most  promising  expressions  are  stored.  (An  integer  number  called  a  beam 
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width  indicates  how  many  expressions  to  keep  in  storage.)  Combinations  of  these 
expressions  are  used  to  generate  longer  expressions.  The  most  promising  expressions 
generate  even  longer  expressions  and  so  on,  until  some  previously  defined  expression 
length  is  reached.  Then  the  best  rule  in  this  final  set  is  put  into  the  classifier. 

Section  2.3.3. 1  describes  the  production  rule  learning  system  used  in  this  research. 
This  learning  system  proved  to  be  very  useful  and  quite  powerful  for  our  application. 


2.3,3. 1  Rule  Learner.  The  machine  learning  system  used  in  this  research  is  called 
Rule  Learner  and  was  developed  by  the  Department  of  Computer  Science 

at  the  University  of  Pittsburgh.  It  is  a  descendent  of  the  Meta-DENDRAL  learning 
system  The  Meta-DENDRAL  system  was  successful  in  finding  new  cleavage 

rules  for  use  in  mass  spectrometry.  Mass  spectrometry  is  a  method  for  identifying 
the  chemical  constitution  of  a  substance  by  means  of  the  separation  of  gaseous  ions 
according  to  their  differing  mass  and  charge.  Cleavage  rules  are  used  to  predict  which 
bonds  in  the  substance  will  be  broken 

The  Rule  Learner  system  is  a  knowledge-based  production  rule  system.  Rule 
Learner  learns  a  set  of  rules  based  on  a  training  set  through  heuristic  search  tech¬ 
niques.  The  knowledge  is  made  up  of  the  kinds  of  rules  that  are  plausible  plus  the 
training  set  itself.  There  is  also  a  set  of  user-specified  stopping  conditions  (biases) 
that  indicate  when  a  set  of  rules  adequately  describes  the  training  set.  To  date,  Rule 
Learner  has  been  successfully  used  in  High-Energy  Physics  experiments  It  was 
used  to  diagnose  errors  in  the  beam  line  of  a  particle  accelerator,  design  diagnosable 
beam  lines,  and  design  the  experiments  themselves.  The  high-energy  physics  events 
formed  the  examples  for  learning  and  the  features  used  to  describe  the  examples 
were  physics  quantities.  The  Rule  Learner  system  has  also  been  used  to  predict  the 
carcinogenicity  of  chemicals  and  to  diagnose  errors  in  the  local  loops  of  telephone 
networks. 

However,  the  Rule  Learner  system  does  have  some  limitations.  It  is  unable  to 
learn  rnles  that  relate  one  feature  to  another.  Therefore,  relational  features  must 
be  explicitly  declared  in  the  training  set.  In  addition,  the  running  time  grows  ex¬ 
ponentially  as  a  function  of  the  number  of  conjuncts,  attributes,  and  values.  We 
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can  calculate  the  order  of  this  running  time  for  Rule  Learner’s  search  algorithm 
by  assuming  there  are  a  attributes,  each  having  a  maximum  of  v  nominal  values,  n 
training  instances,  and  a  maximum  of  k  conjuncts  per  rule.  Then  the  search  tree  has 
a  depth  k  and  a  maximum  branching  factor  of  av.  If  an  exhaustive  depth-first  search 
were  used,  the  run  time  would  be  0((au)*’n).  However,  the  Rule  Learner  version  we 
used  in  this  research  implements  a  beam  search.  A  beam  width  of  b  reduces  the  run 
time  to  0{kbav{n+logbav)). 

The  Rule  Learner  system  has  proven  to  be  very  appropriate  for  this  research 
because  it  can  intelligently  search  through  a  large  set  of  data  and  find  rules  that 
explain  the  data.  This  is  well  suited  to  analyzing  training  designs  and  their  resultant 
layouts  so  a  model  of  the  relationships  from  the  layout  tool  can  be  represented  by 
the  rules  learned.  Rule  Learner  also  produces  rules  in  a  clear  and  human-readable 
format  so  we  can  understand  them,  check  them  to  ensure  they  make  sense,  and  use 
them  with  confidence.  Implementation  of  Rule  Learner  in  this  research  is  described 
in  more  detail  in  Section  3.3.2. 
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3.0  RESEARCH  APPROACH 


The  goal  of  this  research  is  to  find  a  method  for  quickly  obtaining  high  quality 
area  and  delay  estimates  so  we  can  increase  the  effectiveness  of  high-level  synthesis 
systems.  Section  2.1  describes  several  area  and  delay  estimators  that  analyze  designs 
and/or  model  layouts.  None  of  these  estimators  consider  modeling  the  input-to- 
output  relationships  that  result  from  using  layout  tools  instead  of  analyzing  the 
design  or  the  layout  artifact.  The  premise  of  this  research  is  a  belief  that  we  can 
model  these  relationships  from  layout  tools  and  use  these  models  in  place  of  the 
tools  to  quickly  obtain  accurate  layout  information.  High-level  synthesis  systems  can 
effectively  use  this  layout  information  to  obtain  area  and  delay  estimates. 

This  approach  has  a  significant  time  advantage  because  it  does  not  produce  an 
actual  layout.  It  does  not  rely  on  time-consuming  placement  and  routing  techniques. 
This  approach  also  has  an  accuracy  advantage  because  it  does  produce  and  use  low- 
level  layout  information  based  on  actual  layouts.  It  does  not  rely  on  inaccurate  and 
high-level  models  of  area  and  delay. 

Having  decided  on  a  method  that  models  layout  tools,  we  considered  the  following 
alternative  approaches: 

•  Use  layout  tool  in  “brain-damaged”  mode. 

•  Use  library  of  designs  and  corresponding  layouts. 

•  Perform  input-to-output  analysis  of  layout  tool. 

Each  of  these  approaches  is  described  in  Section  3.1.  This  section  describes  how 
we  carefully  considered  each  approach  before  choosing  one  for  this  research.  We 
selected  the  input-to-output  analysis  approach  because  it  yields  better  and  faster 
results  than  the  other  two.  This  approach  also  allows  for  greater  flexibility  because 
it  can  be  applied  across  various  design  architectures  and  on  different  layout  tools. 
Following  the  section  describing  the  alternative  approaches,  Section  3.2  presents  an 
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high-level  overview  and  Section  3.3  presents  the  solution  architecture  for  how  we  used 
machine  learning  techniques  to  model  layout  tools. 

3.1  Alternative  Approaches  to  Modeling  Layout  Tools 

One  approach  to  modeling  layout  tools  is  to  use  a  “brain-damaged”  version  of 
the  target  layout  tool.  The  principle  here  is  to  trade  very  accurate  results  for  quickly 
obtained  results.  The  target  layout  tool  is  scaled  back  so  it  produces  a  less  than 
optimal  layout  in  a  shorter  period  of  time.  For  example,  a  layout  tool  that  uses  a 
user-defined  number  of  iterations  in  its  algorithm  may  be  scaled  back  by  decreasing 
the  number  of  iterations.  Table  2  shows  height  and  width  measurements  for  a  sample 
set  of  designs  we  processed  through  Artistll  using  a  different  number 

of  iterations.  (Because  Artistll  uses  a  cost  function  based  on  minimizing  area,  com¬ 
paring  delay  values  is  meaningless.)  The  height  and  width  measurements  come  from 
the  Magic  layout  mask  produced  by  Artistll. 

The  first  column  shows  the  name  of  the  design  and  the  number  of  gates.  The 
second  column  shows  the  area  dimension  (height  or  width)  being  evaluated.  The 
third  column  shows  the  height  and  width  values  (in  lambda  units,  1  lambda  =  2 
microns)  for  only  100  iterations  of  Artistll.  Using  only  100  iterations  of  Artistll  is 
considered  a  “brain-damaged”  mode.  The  fourth  column  shows  the  percent  difference 
between  these  values  using  100  iterations  and  the  values  we  obtained  by  using  10,000 
iterations  of  Artistll  (col.  7).  (We  used  10,000  iterations  as  a  reference  here  because 
we  modeled  Artistll  at  10,000  iterations.)  The  fifth  column  shows  the  height  and 
width  values  for  1000  iterations.  Using  only  1000  is  considered  a  “brain-damaged” 
mode,  but  it  is  a  little  smarter  than  using  only  100  iterations.  The  sixth  column  shows 
the  percent  difference  between  values  using  1000  iterations  and  values  using  10,000 
iterations.  The  seventh  column  shows  the  reference  values  using  10,000  iterations 
of  Artistll.  Because  10,000  iterations  is  the  reference  number  of  iterations  it  is  not 
considered  a  “brain- damaged”  mode. 

For  example,  one  of  the  designs  is  a  16-bit  incrementer  named  incl6.  It  has 
107  gates.  The  reference  values  at  10,000  iterations  are  1115  lambda  for  height  and 
520  lambda  for  width.  Artistll  required  3408  CPU  seconds  to  produce  a  layout  using 
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Table  2  “Brain-Damaged”  Results  from  Artistll 


Design 
Name 
(#  gates) 

Dim. 

100 

Iterations 

Percent 

Error 

1000 

Iterations 

Percent 

Error 

10000 

Iterations 

add4 

ht 

555 

20.92% 

499 

8.71% 

459 

(43) 

wd 

422 

39.27% 

359 

18.48% 

303 

add32 

ht 

3179 

6.71% 

3259 

9.40% 

2979 

(407) 

wd 

1640 

9.33% 

1563 

4.20% 

1500 

abs8 

ht 

1074 

8.81% 

1083 

9.73% 

987 

(104) 

wd 

695 

33.65% 

527 

1.35% 

520 

dec  16 

ht 

963 

39.36% 

723 

4.63% 

691 

(107) 

wd 

1122 

37.84% 

891 

9.46% 

814 

mcl6 

ht 

1339 

20.09% 

1243 

11.48% 

1115 

(107) 

wd 

856 

64.61% 

625 

20.19% 

520 

mult  8 

ht 

3971 

9.48% 

3859 

6.40% 

3627 

(711) 

wd 

3663 

11.98% 

2963 

-9.42% 

3271 

10,000  iterations.  Using  only  100  iterations  which  took  Artistll  only  62  CPU  seconds, 
we  obtained  a  height  of  1339  lambda  and  width  of  856  lambda.  These  differ  from  the 
reference  values  by  20.09%  for  height  and  64.61%  for  width.  Using  1000  iterations 
which  took  Artistll  a  little  longer  (353  CPU  secs.),  we  obtained  1243  lambda  for 
height  and  625  lambda  for  width.  These  differ  from  the  reference  values  by  11.48% 
for  height  and  20.19%  for  width.  This  shows  that  as  the  number  of  iterations  is 
increased,  the  percent  difference  decreases.  It  also  shows  that  as  the  number  of 
iterations  increases,  so  does  the  time  required  for  Artistll  to  produce  a  layout. 

The  results  we  obtained  from  using  a  “brain-damaged”  mode  of  Artistll  are  not 
very  good,  but  may  be  acceptable  as  rough  estimations.  The  biggest  drawback  to 
this  approach  is  it  still  requires  producing  a  layout.  Therefore,  very  large  designs  still 
require  a  substantial  amount  of  time  and  memory,  even  if  the  number  of  iterations 
is  decreased.  In  addition,  as  we  can  see  in  Table  2,  the  accuracy  decreases  as  we 
decrease  the  number  of  iterations. 

Another  approach  is  to  use  a  library  of  designs  and  layouts.  The  target  layout  tool 
could  process  a  set  of  typical  designs  to  generate  a  set  of  corresponding  layouts.  The 
design  descriptions  and  their  corresponding  layouts  would  be  stored  in  the  library.  We 
would  process  a  new  design  by  matching  it  up  with  one  of  the  designs  in  the  library 
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and  using  information  from  the  corresponding  layout  to  estimate  area  and  delay.  One 
drawback  to  this  approach  is  the  amount  of  memory  required  for  a  library  containing 
a  sufficient  set  of  designs  and  layouts.  Another  drawback  is  the  time  and  processing 
required  to  perform  the  extensive  search  through  the  library  while  trying  to  match 
up  designs,  either  in  total  or  in  part. 

The  last  approach  (the  one  we  selected  for  this  research)  is  to  determine  what 
input-to-output  relationships  result  from  using  the  layout  tool.  This  requires  using 
the  target  layout  tool  to  process  test  designs  into  actual  layouts,  and  then  analyzing 
both  the  input  test  designs  and  the  resultant  layouts  to  determine  what  relationships 
occur  between  them.  These  relationships  form  the  model.  When  applied  to  an 
actual  design,  this  model  will  produce  sufficient  information  describing  a  predicted 
layout  from  which  we  can  obtain  area  and  delay  estimates.  Applying  the  model 
and  estimating  area  and  delay  in  this  way  is  orders  of  magnitude  faster  than  just 
producing  an  actual  layout,  even  for  large  designs. 

However,  while  this  approach  is  general  and  can  be  abstracted  to  other  layout 
tools,  a  set  of  input  test  designs  that  sufficiently  represent  the  expected  designs 
and  a  data  analysis  tool  are  required.  For  this  research,  the  expected  designs  are 
simple  microprocessor  aritmetic  logic  unit  (ALU)  components  using  single  cycle  clocks 
with  one  instruction  per  cycle.  Therefore,  the  suite  of  test  designs  (described  in 
Section  5.2)  is  made  up  of  these  components. 

We  considered  three  approaches  for  the  data  analysis  tool.  One  approach  was  to 
build  a  custom  data  analysis  tool  from  scratch.  While  this  would  serve  the  purpose, 
it  would  be  too  specific  to  one  tool  to  be  abstracted  to  other  tools.  A  separate 
analysis  tool  would  have  to  be  built  for  each  layout  tool  being  modeled.  Another 
approach  was  to  use  an  existing  database  analysis  tool  to  discover  trends  in  the 
layouts  based  on  their  corresponding  designs.  The  drawback  to  this  option  was 
performing  the  extensive  but  necessary  modifications  to  customize  this  analysis  tool 
so  it  could  function  on  the  target  layout  tool.  This  also  made  this  approach  too 
specific.  A  third  approach  was  to  use  a  general  purpose  machine  learning  system. 
As  it  turns  out,  we  can  readily  customize  this  type  of  system  to  function  on  a  wide 
range  of  layout  tools. 
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Having  selected  the  machine  learning  approach  for  our  data  analysis  tool,  we  con¬ 
sidered  three  types  of  learning  system  implementations:  linear  discriminants,  neural 
nets,  and  production  rule  based  systems  As  explained  in  Section  2.3, 

we  could  use  any  of  these  systems  to  learn  from  a  training  set  how  design  features 
map  to  layout  concepts.  However,  in  the  case  of  linear  discriminants  and  neural  nets, 
the  underlying  rules  would  be  hidden.  This  makes  it  very  difficult,  if  not  impossible, 
to  verify  the  rules  make  sense  and  can  be  used  with  confidence.  For  this  research,  we 
found  it  necessary  to  verify  the  rules  when  learning  from  a  limited  number  of  designs. 
For  example,  all  of  the  training  designs  in  a  single  training  set  may  have  a  certain 
combination  of  features  that  appears  to  affect  their  resultant  layouts,  when  in  fact  it 
is  just  a  coincidence.  As  a  result,  the  learning  system  would  learn  this  coincidence 
as  fact  and  produce  a  rule  that  modeled  it.  However,  this  rule  would  be  valid  only 
for  the  training  designs  and  not  hold  for  others.  In  other  words,  the  rule  would  be 
too  specific  and  not  general  enough  to  apply  to  all  designs.  If  this  situation  occurred 
while  using  linear  discriminants  or  neural  nets,  the  apparent  error  rate  would  be  low 
while  the  true  error  rate  would  be  high  (see  Section  2.3).  It  would  be  very  difficult 
to  discover  why.  If  we  were  using  a  production  rule  system,  we  could  judge  the  rule 
directly  to  determine  why  the  error  rates  were  different. 

The  approach  using  the  production  rule  based  learning  system  proved  to  be  the 
best  solution  for  other  reasons.  Unlike  linear  discriminants  or  neural  nets,  production 
rule  systems  allow  the  same  set  of  features  to  be  mapped  to  more  than  one  layout 
concept  (50,  52)  _  fpyg 

is  useful  in  the  VLSI  CAD  domain  where  we  have  real  physical 
constraints.  For  example,  a  particular  set  of  features  in  a  design  may  warrant  the 
prediction  of  a  certain  layout  concept  with  a  high  certainty  factor.  However,  due  to  a 
real  physical  limitation,  this  concept  is  not  possible.  Therefore,  it  would  be  useful  to 
have  a  “second  choice”  layout  concept.  This  alternative  layout  concept  would  come 
from  another  rule  based  on  the  same  set  of  features.  This  layout  concept  would  have 
a  lower  certainty  factor,  but  could  possibly  satisfy  both  the  rules  and  the  physical 
limitations. 

There  is  another  reason  closely  related  to  this  one.  In  this  research,  we  found 
that  an  incorrect  prediction  may  be  “close  enough”  for  estimation  purposes.  An  area 
or  delay  estimate  based  on  an  incorrect  prediction  may  have  only  a  small  percent 
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difference  from  the  same  estimate  based  on  the  correct  prediction.  This  means  that 
a  “black-and-white”  prediction  scheme  is  not  necessary.  This  “gray”  area  allows 
more  flexibility  in  the  learning  system.  In  addition  to  satisfying  physical  limitations, 
alternative  choices  for  layout  concept  predictions  could  yield  end  results  within  the 
bounds  of  an  acceptable  error.  Only  production  rule  based  systems  allow  alternative 
predictions.  In  addition  to  all  these  reasons,  the  production  rule  based  learning 
system  we  selected  for  this  research  (see  Section  2.3.3. 1)  has  already  demonstrated 
usefulness  as  a  data  analysis  tool 

3.2  Overview  of  a  Machine  Learning  Approach 

Our  approach  to  modeling  layout  tools  uses  a  machine  learning  system  to  learn 
what  features  of  a  design  relate  to  certain  layout  concepts,  such  as  relative  place¬ 
ment,  for  a  target  layout  tool.  This  mapping  of  design  features  to  layout  concepts  is 
represented  by  a  set  of  rules.  These  rules  constitute  the  model,  and  when  applied  to  a 
target  design,  these  rules  map  the  design’s  features  into  predicted  layout  information 
so  we  can  derive  estimates  of  area  and  delay. 

Figure  4  shows  a  high-level  overview  of  the  modeling  process.  The  target  layout 
tool  produces  layouts  for  a  set  of  training  designs.  Topological  and  graphical  features 
from  the  training  designs  and  relative  placement  concepts  from  the  associated  layouts 
are  built  into  training  sets.  The  learning  system  learns  from  these  training  sets  a 
generalized  set  of  rules.  These  rules  constitute  the  model  of  the  relationships  that 
result  from  using  the  layout  tool. 


Figure  4  Modeling  Process  Overview 

When  applied  to  a  target  design,  the  layout  tool  model  (the  set  of  rules  defining 
the  relationships)  determines  what  layout  concepts  result  from  the  features  in  the 
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design.  We  can  use  these  layout  concepts  to  derive  area  and  delay  estimates.  A  high- 
level  overview  of  this  process  is  shown  in  Figure  5.  The  area  and  delay  estimates  are 
accurate  because  the  rules  are  based  on  how  the  layout  tool  produces  layouts  for  the 
training  designs.  The  estimates  are  obtained  more  efficiently  because  it  takes  much 
less  time  to  apply  the  rules  than  to  produce  an  actual  layout,  especially  for  large 
designs.  (The  target  layout  tool  is  modeled  off-line  so  it  does  not  directly  affect  the 
design  time.) 


Figure  5  Applying  the  Model  Overview 


3.3  Solution  Architecture 

This  section  describes  the  solution  architecture  for  our  method  of  modeling  the 
relationships  that  result  from  using  a  layout  tool  so  we  can  quickly  obtain  accurate 
area  and  delay  estimates.  Here  we  present  a  high  level  algorithm  that  describes  how 
these  relationships  are  modeled  and  how  the  model  is  used  to  produce  area  and  delay 
estimates.  We  describe  each  step  of  the  algorithm,  using  implementation  details  from 
modeling  an  actual  layout  tool  to  clarify  the  discussion.  The  target  layout  tool  for 
this  research  is  Artistll  from  the  Keystone  Tool  Suite  The  complete  solution 

architecture  as  implemented  for  modeling  Artistll  is  shown  in  Figure  6. 
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Figure  6  Solution  Architecture 


3.3.1  Modeling  Method  Algorithm 

There  are  two  major  parts  to  the  algorithm:  producing  the  model  and  using  the 
model.  Producing  the  model  (as  shown  in  Figure  4)  uses  a  machine  learning  system 
as  a  data  analysis  tool  to  search  through  a  set  of  training  data  looking  for  trends  and 
characteristics  that  determine  the  relationships  from  using  the  target  layout  tool. 
Using  the  model  (as  shown  in  Figure  5)  involves  applying  the  model  to  a  target 
design  to  get  predicted  layout  information  and  then  using  this  information  to  derive 
area  and  delay  estimates. 
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3.3.2  Part  One:  Producing  the  Model 

The  first  major  part  of  the  algorithm  consists  of  two  procedures:  building  a 
training  set  and  analyzing  the  training  set. 


3. 3. 2.1  Procedure  One:  Building  a  Training  Set.  In  this  procedure,  the  user 
selects  a  set  of  designs  and  processes  them  through  the  target  layout  tool  into  actual 
layouts.  Information  from  the  training  designs  and  their  corresponding  layouts  is  pro¬ 
cessed  into  a  training  set.  This  training  set  represents  the  trends  and  characteristics 
that  the  target  layout  tool  exhibits  on  the  training  designs.  For  this  application,  a 
training  set  must  identify  design  features  that  affect  the  layouts.  It  must  also  include 
sufficient  information  so  it  can  adequately  represent  the  expected  target  designs. 

If  the  target  layout  tool  is  reconfigurable  and/or  has  different  execution  parame¬ 
ters,  the  user  needs  to  build  different  training  sets.  The  same  training  designs  may 
be  used,  but  different  layouts  are  produced  using  the  target  layout  tool  in  the  desired 
configurations  with  the  desired  execution  parameters.  Figure  7  shows  the  five  steps 
in  building  a  training  set. 

Step  1.  Select  set  of  training  designs. 

Step  2.  Pre-process  each  training  design  into  a  data  format 
that  the  target  layout  tool  can  handle. 

Step  3.  Use  target  layout  tool  to  produce  a  layout  for 
each  training  design. 

Step  4.  Transform  data  from  each  training  design  and  its 
iayout  into  training  set  format  and  add  to  training  set. 

Step  5.  (Optionai)  Repeat  Steps  1  through  4  if  target  layout 
tool  has  different  configurations  and/or  execution  parameters. 


Figure  7  Building  a  Training  Set 
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Step  1.  The  first  step  in  building  a  training  set  is  to  select  a  set  of  training 
designs.  We  use  these  designs  to  determine  the  characteristics  of  the  layout  tool  being 
modeled.  For  this  research,  the  target  designs  are  microprocessor  ALU  components, 
so  the  set  of  training  designs  consist  of  functional  units  that  are  commonly  found  in 
ALUs  (i.e.  adders,  multipliers,  subtracters,  absolute  value  operators,  incrementers, 
decrement ers,  etc.). 

A  training  set  is  considered  complete  when  it  reaches  a  point  where  a  substantial 
increase  to  its  size  does  not  increase  the  performance  of  the  resultant  model  (set  of 
rules)  (This  performance  is  measured  by  applying  the  model  and  comparing 

the  results  to  those  from  using  the  real  layout  tool.)  In  the  context  of  this  research, 
a  substantial  increase  means  adding  more  training  designs  to  the  training  set.  The 
appropriate  number  of  training  designs  is  determined  through  experimentation  and 
analysis  of  the  results.  However,  the  initial  training  set  should  include  a  sufficient 
representative  sample  of  designs  that  the  model  is  expected  to  process.  For  example, 
one  of  the  initial  training  sets  for  the  Artistll  layout  tool  consists  of  five  adder  designs, 
each  using  identical  architectures,  but  with  different  bit  widths  (4,  8,  12,  16,  and  32- 
bits).  This  training  set  is  sufficient  for  adder-based  components  because  it  consists 
of  adder  designs  with  the  most  commonly  used  bit  widths  and  one  with  an  unusual 
bit-width. 

To  make  the  method  as  general  as  possible,  the  training  designs  should  be  gen¬ 
erated  at  a  high  level  so  they  can  be  translated  into  whatever  intermediate  form  of 
representation  is  appropriate  for  the  target  layout  tool.  The  training  designs  should 
also  be  independent  of  the  layout  tool  being  modeled.  In  addition,  the  overall  func¬ 
tionality  of  the  training  designs  should  not  depend  on  what  node  types  the  target 
layout  tool  can  handle.  (In  the  context  of  this  research,  a  node  in  a  design  can  be 
either  a  primitive  gate  or  a  standard  cell.  It  is  not  necessary  to  make  a  distinction 
between  these  two  because  this  is  a  general  method  that  deals  at  a  high  level  where 
a  primitive  gate  is  treated  the  same  as  a  standard  cell.)  Also,  the  training  designs 
themselves  should  only  deal  with  functionality,  degree,  and  connectivity  for  each 
node  and  interconnections  among  the  nodes.  However,  only  those  node  types  that 
the  target  layout  tool  can  process  can  be  used  to  implement  the  training  designs. 
This  means  that  each  training  design  must  be  representable  as  a  netlist  of  node  types 
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that  the  target  layout  tool  can  process. 

Step  2.  The  second  step  is  to  process  the  training  designs  into  a  format  that  the 
target  layout  tool  can  handle.  This  can  be  done  by  the  high  level  synthesis  system 
that  uses  the  target  layout  tool.  This  avoids  the  necessity  to  create  a  new  tool. 

The  target  layout  tool  for  this  research,  Artistll,  requires  representation  of  designs 
in  the  GLUE  format.  This  was  done  by  using  tools  from  the  Keystone  Tool 

Suite.  As  shown  in  Figure  6,  the  training  designs,  written  in  VHDL  were 

translated  into  the  GLUE  format  by  compiling  them  using  the  tool  called  VCOMP 
(40)  VCOMP  compiles  the  designs  into  the  Intermediate  VHDL  Format  (IVF). 
Another  tool,  IVF2GLUE  was  used  to  translate  this  IVF  format  into  the 

GLUE  format. 

Step  3.  The  third  step  is  to  process  each  training  design  through  the  target 
layout  tool  into  an  actual  layout.  If  the  target  layout  tool  is  reconfigurable  and/or 
has  user  set  parameters,  these  should  be  set  to  the  desired  values  that  the  model  is  to 
reflect.  A  training  set  and  the  resultant  model  can  only  reflect  one  configuration  of 
the  layout  tool  using  one  set  of  parameter  values.  (If  this  method  is  used  to  model  a 
target  layout  tool  for  a  range  of  configurations  and  parameter  settings,  then  a  super 
set  of  training  sets  should  be  produced.  This  will  allow  the  production  of  different 
models.  Then,  we  can  select  the  appropriate  model  to  represent  the  target  layout 
tool  based  on  the  desired  configuration  and/or  parameter  settings.) 

For  this  research,  we  used  Artistll  to  process  each  training  design  into  an  actual 
layout.  As  described  in  Section  2.2.2,  an  Artistll  layout  is  represented  by  a  grouping 
description  (.grp)  file  that  indicates  the  row  and  column  location  of  each  primitive 
gate  in  the  layout.  The  gates’  relative  placement  information  is  obtained  from  this 
grouping  description  file. 

Step  4.  The  fourth  step  involves  processing  the  training  design  and  the  layout 
into  a  training  set  data  format.  This  format  is  determined  by  the  analysis  tool. 
For  this  research,  the  analysis  tool  is  a  machine  learning  system.  This  learning 
system  determines  if  there  are  relationships  between  the  training  designs  and  their 
corresponding  layouts.  Another  software  tool  called  Match  (written  exclusively  for 
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this  research)  processes  information  from  a  flattened  description  file  and  the  layout 
placement  file  into  the  appropriate  training  set  data  format. 

Each  training  set  includes  a  set  of  attribute- value  pairs  describing  various  features 
about  the  training  design  and  a  set  of  layout  concepts  that  indicate  relative  placement 
of  the  nodes  in  the  layout.  The  learning  system  is  expected  to  learn  what  relationships 
exist  between  the  training  design  features  and  the  layout  concepts. 

As  described  earlier,  the  training  set  must  identify  design  features  that  affect  the 
layouts  so  we  can  learn  what  relationships  exist  between  them.  To  do  this,  we  used 
graph  application  principles  to  determine  what  features  to  describe.  Therefore,  the 
attribute-value  pairs  describe  features  dealing  with  connectivity  and  degree  for  each 
node,  and  topology  for  the  rest  of  the  design  in  relation  to  the  node.  Attribute- value 
pairs  also  describe  both  individual  and  relative  features  of  the  nodes.  Individual 
node  features  include  the  minimum  area,  height,  and  width  of  each  node.  They  also 
include  the  associated  delays  for  rising  and  falling  output  transitions,  the  number  of 
inputs,  and  the  node’s  function.  Relative  node  features  are  explained  below.  (Note: 
all  of  this  information  is  obtained  from  a  node  library  as  the  training  set  is  built.) 
While  some  of  the  features  seem  redundant,  such  as  area  when  height  and  width 
are  available,  it  is  necessary  to  explicitly  describe  all  features.  This  is  because  the 
learning  system  is  incapable  of  learning  relationships  among  the  features.  It  is  an 
inductive  learning  system  and  can  only  learn  by  observation. 

Because  it  is  necessary  to  pair  up  each  node  with  every  other  node  in  order  to 
obtain  relative  placement  information  from  the  layout,  node  pairs  are  used  to  form 
the  individual  training  instances  in  the  training  set.  This  requires  pairing  up  each 
node  with  every  other  node  in  the  training  design.  This  node  pairing  produces  a 
training  set  of  [n^— n]/2  pairs  where  n  is  the  number  of  nodes.  While  this  is  O(n^), 
it  can  be  reduced  somewhat  because  it  is  meaningless  to  pair  up  a  node  with  itself 
(the  —n  factor),  and  mirrored  pairings  provide  the  same  data  (division  by  2).  The 
features  for  each  node  pair  include  each  node’s  individual  set  of  attribute- value  pairs 
as  described  previously,  plus  a  set  of  relational  attribute- value  pairs.  These  relational 
features  describe  how  the  two  nodes  relate  to  each  other.  This  includes  information 
about  which  node  has  more  area,  more  height,  more  width,  more  delay,  more  total 
inputs,  more  external  inputs,  and  more  internal  inputs. 
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Other  relational  features  include  whether  or  not  one  node  has  an  input  to,  output 
from,  or  no  connection  to  the  other  node,  and  if  the  two  nodes  share  a  common 
destination  or  a  common  source.  Once  again,  it  is  necessary  to  explicitly  describe 
these  relationships  between  the  two  nodes  because  the  learning  system  is  incapable 
of  learning  them. 

We  can  determine  if  the  list  of  training  design  features  is  complete  by  using  the 
same  test  as  the  training  set.  If  the  list  is  complete,  the  performance  of  the  system 
is  not  increased  by  adding  more  features.  This  is  also  determined  through  experi¬ 
mentation.  For  this  research,  the  initial  list  of  features  came  from  the  CAD  domain. 
The  connectivity  and  topology  features  are  based  on  relevant  features  found  in  graph 
applications.  The  physical  features  deal  with  the  main  goal  of  the  research:  estimat¬ 
ing  area  and  delay.  (Our  goal  for  the  initial  list  is  to  include  as  much  descriptive 
information  as  possible.  Then  we  can  prune  the  information  that  does  not  show  up 
in  the  relationships.) 

The  layout  information  (relative  placement)  for  each  node  pair  is  obtained  by 
examining  the  layout  to  determine  where  the  nodes  are  located  relative  to  each  other. 
For  example,  they  may  be  located  next  to  each  other,  at  opposite  ends  of  the  layout, 
or  somewhere  in  between.  Each  node  pair  has  a  relative  placement  as  its  concept, 
and  the  learning  system  tries  to  learn  what  features  about  the  two  nodes  determine 
where  they  are  placed  relative  to  each  other.  We  can  use  the  relative  placement  to 
derive  the  relative  distance.  The  relative  distance  determines  wirelength  between  the 
logically  connected  nodes. 

The  relative  placement  concepts  we  use  are  classifications  based  on  the  Manhattan 
distance  between  the  two  nodes  in  the  node  pair.  These  classifications  could  include 
only  one  distance  or  more  than  one,  depending  on  how  well  the  resultant  model  per¬ 
forms.  However,  the  same  distance  may  not  be  used  by  more  than  one  concept.  The 
learning  system  requires  discrete  categorization  and  overlapping  distances  perturbes 
the  learning  process.  (This  is  explained  more  in  Section  4.1.) 

For  this  research,  the  Match  program  parses  through  each  training  design’s  flat¬ 
tened  description  (.fla)  file  and  the  corresponding  layout  placement  (.grp)  file  to  get 
the  raw  training  data.  As  shown  in  Figure  6,  the  flattened  description  file  is  produced 
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by  a  tool  called  Clusterll  The  primary  purpose  of  this  tool  is  producing  an 

initial  gate  clustering  that  helps  speed  up  and  improve  the  layout  processing  done  by 
Artistll.  However,  for  this  research,  Clusterll  is  used  only  to  produce  the  flattened  file 
description,  and  not  for  initial  gate  clustering.  (If  we  were  to  use  Clusterll  to  provide 
an  initial  clustering,  the  resultant  layouts  would  be  different  and  the  learning  system 
would  learn  relationships  due  to  a  combination  of  Artistll  and  Clusterll.  We  only 
wished  to  model  Artistll,  so  we  did  not  use  Clusterll  in  the  clustering  mode.)  Once 
all  training  designs  and  layouts  have  been  processed  into  the  proper  data  format,  all 
of  the  data  is  accumulated  into  one  training  set. 

However,  before  the  learning  system  can  learn  rules  from  the  training  set,  the 
Match  program  must  build  one  more  file.  This  file  is  called  the  Partial  Domain 
Model  (PDM)  (This  domain  information  is  contained  in  the  .names  file 

shown  in  Figure  6.)  The  purpose  of  the  PDM  file  is  to  describe  the  target  domain  so 
the  learning  system  has  a  context  in  which  to  operate.  The  PDM  file  does  this  by 
listing  the  concepts  we  want  the  learning  system  to  learn  and  by  listing  the  relevant 
features  that  describe  our  domain.  (These  are  the  same  concepts  and  features  as 
described  above.) 

Because  this  file  can  not  have  an  infinite  size,  it  can  only  describe  part  of  the 
domain,  hence  the  name  “Partial”  Domain  Model.  In  addition,  the  PDM  should  not 
be  confused  with  the  target  layout  tool  model,  which  is  a  set  of  rules.  The  PDM 
file  can  be  as  simple  or  as  complex  as  necessary.  The  complexity  depends  on  the 
relevant  features  in  the  domain  and  the  desired  concepts  to  be  learned.  The  PDM 
file  provides  a  powerful  way  to  generalize  the  learning  system  so  it  can  learn  in  any 
domain.  Figure  8  shows  one  of  the  PDM  files  used  in  this  research.  This  file  lists  the 
layout  concepts  that  we  expect  the  learning  system  to  learn  on  the  first  line.  The 
design  features  are  listed  on  the  remaining  lines.  This  file  also  describes  the  type 
(symbolic,  numeric)  and  admissible  values  for  each  feature  in  the  training  set. 

For  example,  this  particular  PDM  file  shows  seven  layout  concepts:  near^s,  neav-d, 
close.  1,  close.2,  far.l,  far. 2,  and  way. out.  (The  reasons  for  these  particular  concepts 
are  explained  in  Section  4.1.)  These  concepts  represent  relative  placements  of  the 
two  nodes  in  each  training  instance.  The  relative  placement  of  two  nodes  indicates 
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how  far  apart  the  nodes  are.  Table  3  shows  the  seven  relative  placement  concepts 
and  the  corresponding  number  of  positions  apart.  The  concept  of  nears  indicates 
that  the  two  nodes  are  placed  next  to  each  on  the  same  row.  The  concept  of  near-d 
indicates  that  the  two  nodes  are  next  to  each  other,  but  on  different  rows.  In  other 
words,  one  node  is  above  and  one  node  is  below.  The  concept  of  close-1  indicates  the 
two  nodes  are  two  or  three  positions  apart  from  each  other,  on  the  same  row  or  on 
different  rows.  The  concept  of  close-2  indicates  the  nodes  are  four  or  five  positions 
apart.  The  concept  of  far-1  indicates  the  two  nodes  are  six  or  seven  positions  apart. 
The  concept  of  far-2  indicates  the  two  nodes  are  eight  or  nine  positions  apart.  The 
concept  of  way-Out  indicates  the  two  nodes  are  10  or  more  positions  apart. 


Table  3  Relationship  of  Node  Position  to  Relative  Placement 


Relative 

Placement 

Concept 

Number  of 
Positions 
Apart 

near_s 

near.d 

close.l 

close_2 

far.l 

far_2 

way_out 

1  (same  row) 

1  (different  row) 
2-3 

4-5 

6-7 

8-9 

10+ 

The  remainder  of  the  file  shows  the  design  features  and  their  types  and  admissible 
values.  For  example,  the  design  feature  log-Conn  indicates  if  the  two  nodes  in  the 
training  instance  are  logically  connected.  The  admissible  values  are  either  yes  or 
no.  (The  admissible  values  also  imply  the  type;  in  this  case  yes  and  no  indicate  a 
symbolic  type.)  Another  feature  such  as  area-1,  which  indicates  the  area  required  by 
the  first  node  in  the  node  pair,  has  a  continuous  admissible  value.  This  means  it  is  a 
numeric  type  and  can  take  on  any  numerical  value.  Once  the  training  set  and  PDM 
files  are  complete,  the  learning  system  can  begin  analyzing  the  training  set  in  order 
to  produce  rules. 

Step  5.  The  user  may  repeat  steps  1  through  4  if  the  target  layout  tool  has 
different  configurations  and/or  different  execution  parameters  and  these  need  to  be 
modeled.  The  user  repeats  steps  1  through  4  for  each  desired  configuration  of  the 
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near_s,  near_d,  close_l,  close_2,  far_l,  far_2,  way_out.  |  concepts 


log_conn:  yes,  no. 

com.dest: 

yes,  no. 

com_src 

:  yes ,  no . 

al_a2: 

>,  =,  <. 

hl_h2: 

>,  =,  <. 

wl_w2: 

V 

II 

A 

nil_ni2 

V 

II 

A 

nxl_nx2 : 

V 

n 

A 

ntl_nt2 

V 

II 

A 

f unl_f un2 : 

“ »  • 

area_l : 

continuous . 

height. 1 : 

continuous 

width. 1 

continuous. 

delay.l : 

continuous 

funct.l 

ai2 , ai3 , ai4 , aoi21 , aoi211 , aoi22 , 

il. 

oai21,oai211 ,oai22,oi2,oi3,oi4. 

input s.l : 

continuous 

ext_l : 

continuous . 

int.l : 

continuous 

llevl.l 

continuous . 

llev2.1: 

continuous 

llev3_l 

continuous . 

llev4.1: 

continuous 

21evl_l 

continuous . 

21ev2_l: 

continuous 

21ev3_l 

continuous . 

21ev4_l: 

continuous 

area_2 : 

continuous . 

height. 2: 

continuous 

width_2 

continuous . 

del ay. 2 : 

continuous 

funct_2 

ai2 , ai3 , ai4 , aoi21 , aoi21 1 , aoi22 , 

il. 

oai21,oai211,oai22,oi2,oi3,oi4. 

inputs. 2: 

cont inuous 

ext_2: 

continuous . 

int.2: 

continuous 

llevl_2 

continuous . 

llev2.2: 

continuous 

llev3_2 

continuous . 

llev4.2: 

continuous 

21evl_2 

continuous . 

21ev2.2: 

continuous 

21ev3_2 

continuous . 

21ev4.2 : 

cont inuous 

Figure  8  Example  Partial  Domain  Model 


target  layout  tool  using  a  particular  set  of  parameter  values.  Only  one  configuration 
using  one  set  of  parameters  may  be  represented  by  a  single  training  set.  Therefore, 
different  training  sets  are  required  to  produce  different  models.  Each  model  represents 
the  same  target  layout  tool,  but  for  different  configurations  using  different  parameter 
values. 

For  example,  one  of  the  validation  designs  for  this  research  is  a  16-bit  multiplier 
(see  Section  5.2).  This  design  consists  of  about  3000  primitive  gates.  We  discovered 
that  Artistll  required  more  than  a  couple  of  weeks  to  produce  a  layout  for  this  design 
using  10,000  iterations.  (The  number  of  iterations  is  one  of  the  execution  parameters 
for  Artistll.  Larger  numbers  of  iterations  produces  smaller  layouts.)  We  also  found 
that  running  Artistll  continuously  for  two  or  more  weeks  was  difficult  to  achieve 
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on  time-shared  computing  systems  that  reboot  weekly.  Therefore,  it  was  necessary 
to  reduce  the  number  of  iterations  to  100  in  order  to  get  a  layout  for  comparison 
purposes.  However,  if  we  used  a  training  set  based  on  10,000  iterations  to  produce  a 
model,  and  we  used  this  model  to  process  the  16-bit  multiplier,  the  resultant  layout 
should  not  be  compared  to  an  actual  layout  generated  by  Artistll  using  only  100 
iterations.  The  comparison  would  be  invalid.  Therefore,  a  new  training  set  consisting 
of  the  same  training  designs  but  with  actual  layouts  generated  by  Artistll  using  only 
100  iterations  was  necessary  to  train  the  learning  system  to  produce  a  new  model. 
Using  this  new  model  on  the  16-bit  multiplier  provided  a  valid  test. 


3. 3. 2. 2  Procedure  Two:  Analyzing  the  Training  Set.  This  procedure  has  three 
steps.  The  training  set  is  analyzed  for  relationships  between  design  features  and  lay¬ 
out  concepts,  any  relationships  found  are  put  into  a  list,  and  if  no  relationships  are 
found,  the  process  is  repeated  using  different  analysis  tool  parameters.  Figure  9 
shows  these  steps. 

Step  1 .  Analyze  training  set  data  to  determine  if  there  are  any 
relationships  between  training  design  features  and 
corresponding  layout  concepts. 

Step  2.  If  relationships  are  found,  accumuiate  them  into  a 
list.  This  iist  constitutes  the  model  of  the  target  layout  tool. 

Step  3.  If  no  relationships  are  found,  use  different  analysis 
parameters  and  repeat  Step  1.  If  no  relationships  are  found 
after  several  tries,  use  a  different  training  set. 

Figure  9  Analyzing  the  Training  Set 

Step  1.  The  first  step  in  analyzing  the  training  set  data  is  to  determine  if  there  are 
any  relationships  between  features  in  the  training  design  and  the  corresponding  layout 
concepts.  These  relationships  indicate  trends  and  characteristics  of  the  target  layout 
tool.  For  example,  if  the  target  layout  tool  tends  to  place  logically  connected  nodes 
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close  together,  this  will  come  out  in  the  analysis.  It  will  show  up  as  a  relationship 
between  logically  connected  nodes  and  close  placement. 

This  analysis  is  performed  by  the  machine  learning  system.  The  learning  system 
searches  through  the  training  set  data  looking  for  evidence  that  supports  relationships 
between  the  design  features  and  the  relative  placement  concepts.  If  the  evidence 
for  a  particular  relationship  is  strong  enough,  a  rule  describing  that  relationship  is 
produced.  The  necessary  strength  is  determined  by  a  parameter  set  by  the  user 
before  executing  the  learning  system.  (This  and  other  aspects  of  the  learning  system 
in  analyzing  the  training  set  are  described  more  in  Section  4.2.) 

Step  2.  The  second  step  is  to  produce  a  list  of  relationships  that  exist.  The 
learning  system  does  this  by  producing  a  set  of  rules  that  formally  and  logically 
describe  the  relationships.  As  described  in  Section  2.3.3,  these  rules  are  of  the  form  < 
Ihs  >  -^  <  rhs  >,  where  <  Ihs  >  is  a  conjunction  of  features  (attribute- value  pairs) 
and  <  rhs  >  is  the  layout  placement  concept  that  the  learning  system  determines  is 
related  to  the  features  m  <  Ihs  >.  An  example  set  of  rules  is  shown  in  Figure  10. 
Because  these  rules  are  based  on  relationships  between  features  in  the  training  designs 
and  corresponding  layout  concepts,  they  reflect  the  characteristics  of  the  target  layout 
tool.  When  applied  to  a  target  design,  these  rules  map  features  in  the  design  to  layout 
concepts  and  provide  layout  information  that  can  be  used  to  estimate  area  and  delay. 

For  example,  the  first  rule  maps  a  relative  placement  concept  of  neared  to  node 
pairs  that  have  a  different  number  of  external  inputs.  The  statistics  included  in  this 
rule  show  that  64  training  instances  indicate  the  concept  of  neared.  Out  of  these 
training  instances,  31  instances  indicate  the  node  pairs  have  a  different  number  of 
external  inputs.  This  works  out  to  a  48%  belief  in  the  rule.  On  the  other  hand,  the 
statistics  also  show  that  5872  training  instances  have  a  relative  placement  concept 
other  than  neared.  Out  of  these  instances,  1099  instances  indicate  the  node  pairs 
have  a  different  number  of  external  inputs.  This  works  out  to  a  19%  disbelief  in  the 
rule.  All  of  this  generates  a  certainty  factor  of  0.710,  which  is  explained  next. 

As  shown  in  Figure  10,  each  rule  has  a  certainty  factor  associated  with  it.  (Cer¬ 
tainty  factors  were  introduced  in  Section  2.3.)  The  original  certainty  factors  (c/) 
used  by  the  Rule  Learner  system  were  generated  by  Equation  3-1,  where  p  is  the 
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(nxl_nx2  !=)  ==>  near_d 

{  pos=  0.48,  neg=  0.19,  cf=  0.710 
p=31,  n=1099,  tp=64,  tn=5872  } 

(nxl_nx2  !=)  ==>  near_s 

{  pos=  0.41,  neg=  0.18,  cf=  0.692 

p=109,  n=1021,  tp=266,  tn=5670  } 

(nil_iii2  ==)  ==>  way_out 

{  pos=  0.38,  neg=  0.17,  cf=  0.692 

p=1450,  n=356,  tp=3826,  tn=2110  } 

(funl_fun2  ==)  ==>  way_out 

{  pos=  0.32,  neg=  0.16,  cf=  0.669 

p=1220,  n=332,  tp=3826,  tn=2110  } 

(al_a2  ==)  (hl_h2  !=)  ==>  near.s 

{  pos=  0.70,  neg=  0.39,  cf=  0.640 

p=186,  n=2210,  tp=266,  tn=5670  } 

(al_a2  ==)  (hl_h2  !=)  ==>  near_d 

{  pos=  0.64,  nGg=  0.40,  cf=  0.607 

p=41,  n=2355,  tp=64,  tn=5872  } 

(com.src  yes)  ==>  close_l 
{  pos=  0.14,  neg=  0.09,  cf=  0.605 

p=75,  n=475,  tp=546,  tn=5390  } 


Figure  10  Example  Set  of  Rules  No.  1 


number  of  examples  in  the  training  set  supporting  belief  in  the  rule  (called  positive 
examples),  and  n  is  the  number  of  examples  supporting  disbelief  in  the  rule  (called 
negative  examples). 


cf  =  (3- 

p  +  n 

However,  due  to  the  large  inequality  of  the  number  of  training  instances  for  each 
class,  Equation  3-1  did  not  produce  a  good  measure  of  a  rule’s  certainty.  Through 
experimentation  with  many  training  sets  from  this  domain,  we  found  that  the  total 
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number  of  negative  examples  was  usually  much  larger  than  the  total  number  of 
positive  examples.  This  tended  to  force  the  certainty  factors  downward  by  causing 
n  to  be  much  larger  than  p.  But,  if  we  look  at  the  ratio  of  the  number  of  positive 
examples  of  the  rule  to  the  total  number  of  positive  examples  and  the  ratio  of  the 
number  of  negative  examples  of  the  rule  to  the  total  number  of  negative  examples, 
we  can  obtain  a  better  perspective  of  the  rule’s  certainty. 

We  obtained  a  better  perspective  of  a  rules’s  certainty  by  using  Equation  3-2, 
where  p  and  n  are  the  same  as  before,  tp  is  the  total  number  of  examples  having  the 
same  concept,  and  tn  is  the  total  number  of  examples  not  having  the  same  concept. 

Cf  =  (3-2) 

tp  '  tn 

This  formula  normalizes  the  certainty  factors  by  dividing  the  positive  and  negative 
examples  of  the  rules  by  their  respective  total  number  of  examples.  This  reduces  the 
effects  that  a  very  large  total  number  of  positive  or  negative  examples  have  on  the 
certainty  factors.  Further  examination  of  this  formula  shows  that  when  the  ratios  of 
p/tp  and  n/tn  are  equal  in  value,  a  certainty  factor  of  0.5  is  generated.  This  means 
that  the  rule  has  a  50%  chance  of  being  true  or  false.  This  makes  sense  when  we 
consider  a  certainty  factor  above  0.5  as  showing  support  for  belief  in  the  rule,  and  a 
certainty  factor  below  0.5  as  showing  support  for  disbelief  in  the  rule. 

We  developed  this  method  for  calculating  the  certainty  factor  to  accommodate 
limitations  in  this  domain,  namely  the  highly  unequal  distribution  of  positive  and 
negative  examples.  Therefore,  as  in  other  applications  these  certainty  factors 

should  not  be  viewed  as  probabilities  of  a  rule’s  truth.  Rather,  these  certainty  factors 
should  be  used  to  rank  order  the  rules  and  to  weight  the  rules  as  they  are  used  to 
classify  the  test  cases. 

When  more  than  one  rule  predicts  the  same  concept  for  a  node  pair,  the  certainty 
factors  are  combined  into  one  factor  If  other  rules  predict  another  concept  for 
the  same  node  pair,  those  rules’  certainty  factors  are  combined.  This  is  repeated  until 
each  concept  that  results  from  this  same  node  pair  has  only  one  certainty  factor  asso¬ 
ciated  with  it.  These  combined  certainty  factors  are  used  to  rank  order  the  predicted 
concepts  for  that  node  pair.  The  concept  with  the  highest  combined  certainty  factor 
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becomes  the  “first  choice”  for  that  node  pair.  The  concept  with  the  next  highest 
combined  certainty  factor  becomes  the  “second  choice”,  and  so  on. 

The  certainty  factors  are  combined  according  to  Equation  3-3,  where  X  and  Y 
are  the  two  certainty  factors  we  wish  to  combine  (order  does  not  matter).  If  we 
wish  to  combine  more  than  two  certainty  factors,  we  start  out  by  combining  the  first 
two.  This  combined  certainty  factor  is  then  combined  with  the  next  certainty  factor, 
and  so  on.  This  equation  prevents  certainty  factors  from  combining  to  a  value  larger 
than  one.  However,  the  combined  certainty  factor  will  approach  one  eventually,  given 
enough  certainty  factors. 


cUmb{X,Y)  =  XYY{l-X)  (3-3) 

Also,  the  learning  system  was  originally  set  up  to  use  positive  and  negative  thresh¬ 
olds  as  search  parameters.  Only  the  rules  that  had  pjtp  above  a  certain  threshold 
(user-defined  at  runtime)  and  n/tn  below  a  certain  threshold  (also  user-defined)  were 
produced.  Using  this  method  in  the  VLSI  CAD  domain  produced  very  few  good 
rules.  However,  by  using  the  certainty  factors  as  search  parameters  (produce  only 
those  rules  with  a  certainty  factor  above  a  certain  user-defined  threshold),  more  good 
rules  were  produced.  In  addition,  by  setting  the  certainty  factor  threshold  high,  more 
conjuncts  were  used  in  the  left-hand  side  of  the  rules  and  provided  more  information 
on  how  the  design  features  map  to  the  layout  concepts. 

Step  3.  If  no  relationships  are  found,  the  third  step  is  to  repeat  Step  1  using 
different  analysis  parameters.  For  the  machine  learning  system,  this  means  using 
different  biases  and  learning  again  from  the  same  training  set.  These  user- 
specified  biases  include  the  desired  range  of  certainty  factors  for  acceptable  rules,  the 
maximum  number  of  features  that  may  appear  in  the  left-hand  side  of  a  rule,  and 
the  beam  width  for  the  beam  search  (see  Section  2.3).  The  certainty  factor  range 
determines  the  upper  and  lower  limits  on  certainty  factor  values.  Only  those  rules 
whose  certainty  factors  fall  within  this  range  are  accepted. 

The  biases  should  be  thought  of  as  filters.  They  may  be  adjusted  to  filter  out 
relationships  based  on  certainty  factors  or  the  number  of  features  required  in  the  left- 
hand  side  or  adjusted  to  allow  more  relationships  through.  For  example,  increasing 
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the  beam  width  allows  the  search  routine  to  open  up  more  nodes  and  find  more 
relationships.  If  rules  are  not  found  using  a  certainty  factor  range  of  0.8  to  0.9,  then 
maybe  using  a  range  of  0.7  to  0.9  would  produce  better  results.  In  addition,  increasing 
the  maximum  number  of  features  that  may  appear  in  the  left-hand  side  will  generate 
more  rules.  Of  course,  all  of  this  depends  on  whether  or  not  the  relationships  exist. 

3.3.3  Part  Two:  Using  the  Model 

The  set  of  rules  produced  by  the  learning  system  makes  up  the  model  of  the 
relationships  from  using  the  target  layout  tool.  To  apply  the  model,  the  target 
design  must  be  pre-processed  in  the  same  fashion  as  a  training  design.  This  is  shown 
in  Figure  6.  However,  a  layout  is  not  produced.  Instead,  the  model  determines 
what  layout  concepts  result  from  certain  features  in  the  design.  These  concepts 
indicate  the  relative  placement  of  each  node  pair  in  the  target  design.  This  relative 
placement  information  is  then  provided  to  our  estimator  so  it  can  derive  area  and 
delay  estimates. 


3.3.3. 1  Procedure  Three;  Applying  the  Model.  As  shown  in  Figure  11,  there 
are  three  steps  in  this  procedure.  The  target  design  is  pre-processed,  the  model  is 
applied,  and  if  the  model  produces  no  predicted  layout  results,  a  different  model  is 
selected  and  the  process  is  repeated. 

Step  1.  The  first  step  is  to  pre-process  the  target  design  into  the  proper  for¬ 
mat.  This  format  is  identical  to  that  for  a  training  design,  but  because  no  layout  is 
produced,  no  relative  placement  information  is  included.  Pre-processing  the  target 
design  into  this  format  makes  it  straightforward  to  match  features  from  the  target 
design  to  those  in  each  of  the  rules. 

Step  2.  The  second  step  searches  through  the  features  of  each  node  pair  in  the 
target  design  data  to  determine  if  any  features  match  those  in  the  set  of  rules.  The  set 
of  rules  are  listed  in  descending  order  according  to  their  certainty  factors.  The  target 
design  data  is  searched  first  using  the  rule  with  the  highest  certainty  factor.  The  data 
is  searched  again  using  the  rule  with  the  next  highest  certainty  factor,  and  so  on. 
Certainty  factors  are  combined  as  necessary  (as  described  in  Section  3. 3. 2. 2)  until 
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Step  1.  Pre-process  target  design  into  proper  format. 

Step  2.  Appiy  the  modei  by  searching  the  target  design 
data  to  determine  if  any  target  design  features  match 
features  in  the  iist  of  relationships.  If  matches  are 
found,  add  the  corresponding  iayout  concepts  to  the 
predicted  iayout  information. 

Step  3.  If  no  matching  design  features  are  found,  seiect 
a  different  modei  and  repeat  Step  2. 

Figure  11  Applying  the  Model 

each  node  pair  has  a  list  of  predicted  concepts  rank-ordered  according  to  their  final 
combined  certainty  factor.  One  of  the  main  procedures  in  the  Rule  Learner  system  is 
called  the  Inference  Engine  The  Inference  Engine  applies  the  rules  to  the  target 
design  data  and  produces  a  file  containing  all  node  pairs  and  their  predicted  relative 
placement  concepts. 

Step  3.  This  step  is  executed  if  no  matches  are  found.  A  different  model  of  the 
same  target  layout  tool  is  selected  and  Step  2  is  repeated.  Of  course,  this  requires 
different  models  to  have  been  produced  already.  Because  different  models  represent 
different  configurations  of  the  target  layout  tool,  using  a  different  model  is  the  same  as 
using  the  layout  tool  in  a  different  configuration.  Therefore,  the  selection  of  a  different 
model  is  based  on  what  target  layout  tool  configuration  would  produce  better  results. 
As  it  is  with  actual  layout  tools,  some  models  may  not  produce  satisfactory  layouts 
for  certain  designs. 


3. 3. 3.2  Procedure  Four;  Estimating  Delay.  Using  the  layout  information  (rel¬ 
ative  placement)  obtained  by  applying  the  rules  to  a  target  design,  the  delay  estima¬ 
tor  finds  the  design’s  critical  path  and  calculates  the  worst  case  delay.  The  steps  are 
shown  in  Figure  12. 
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Step  1.  Use  predicted  layout  information  to  produce  a  set  of 
estimated  wirelengths  between  all  logically  connected  nodes. 

Step  2.  Perform  critical  path  analysis  based  on  critical  path’s 
output  node  having  a  low-high  transition. 

Step  3.  Repeat  Step  2  based  on  the  critical  path’s  output 
node  having  a  high-low  transition. 

Figure  12  Estimating  Delay 

Step  1.  The  first  step  in  estimating  delay  is  to  derive  the  necessary  wirelengths 
from  the  layout  information.  We  can  derive  the  relative  distance  between  all  logically 
connected  node  pairs  from  their  predicted  relative  placement.  This  relative  distance  is 
translated  into  an  estimated  wirelength.  For  example,  a  predicted  relative  placement 
concept  of  nears  means  the  two  nodes  are  next  to  each  other  on  the  same  row. 
This  translates  into  a  relative  distance  of  one  unit.  For  Magic  files  produced  by  the 
Artistll  layout  tool  using  a  two  micron  process,  one  unit  of  distance  translates  into 
about  70  microns  of  wirelength.  We  found  this  average  unit  distance  by  performing 
a  statistical  analysis  of  the  distance  between  node  pairs  that  were  next  to  each  other 
for  many  test  and  training  design  layouts. 

Steps  2  and  3.  The  second  and  third  steps  determine  the  critical  path  delay 
by  performing  an  exhaustive  depth-first  search  of  all  possible  input-to-output  paths 
in  the  target  design.  Step  2  does  this  for  a  high-low  transition  on  the  path’s  output 
node  while  Step  3  does  this  for  a  low-high  transition  on  the  path’s  output  node. 
The  internal  delay  for  each  node  along  a  path  is  obtained  from  the  library  file  and 
the  wirelength  between  nodes  is  known  from  Step  1.  The  wirelength  determines  the 
wiring  delay.  We  used  delay  equations  from  research  on  the  Tinker  Tool  system 

to  calculate  the  wiring  delay.  The  internal  node  delays  and  the  wiring  delays  are 
summed  along  each  path  to  produce  that  path’s  overall  delay.  The  path  with  the 
worst  delay  is  considered  the  critical  path.  The  delay  of  this  path  is  considered 
the  worst  case  delay  for  the  target  design,  based  on  either  a  high-low  or  low-high 
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transition  on  the  critical  path’s  output. 

Of  course,  performing  an  exhaustive  search  of  all  possible  input-to-output  paths 
is  time-consuming  for  large  designs.  However,  we  have  shown  that  we  can  pre-process 
the  target  design,  apply  the  model,  and  estimate  both  area  and  delay,  all  in  a  time 
much  faster  than  just  producing  an  actual  layout. 


3. 3. 3. 3  Procedure  Five:  Estimating  Area.  Using  the  layout  information  (rela¬ 
tive  placement)  obtained  by  applying  the  rules  to  a  target  design,  the  area  estimator 
calculates  the  overall  area  requirement  for  the  design.  The  steps  are  shown  in  Fig¬ 
ure  13. 

Step  1.  Use  predicted  layout  information  to  produce 
an  estimated  layout. 

Step  2.  Estimate  height  and  width  of  estimated  layout 
due  to  individual  nodes  only. 

Step  3.  Add  in  estimated  height  and  width  of  wiring 
area  to  produce  total  area  estimate. 

Figure  13  Estimating  Area 

Step  1.  The  first  step  in  determining  the  overall  area  is  “growing”  an  estimated 
layout.  One  of  the  nodes  is  selected  as  the  “seed”  node.  This  node  is  placed  in  the 
middle  of  the  estimated  layout.  Other  nodes  are  placed  in  this  layout  according  to 
their  predicted  relative  placement.  Once  all  nodes  have  been  placed,  we  continue 
with  Step  2.  This  is  very  different  from  classical  placement  routines  because  there 
is  no  analysis  of  the  target  design.  The  only  information  available  is  the  relative 
placement  of  the  node  pairs. 

Step  2.  The  second  step  is  determining  the  contribution  of  only  the  nodes  (no 
interconnects)  to  the  overall  height  and  width.  We  obtained  the  height  and  width 
of  each  node  from  the  node  library  file.  The  height  for  each  row  in  the  estimated 
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layout  is  found  by  using  the  value  of  the  tallest  node  in  the  row  as  the  height  of  that 
row.  We  found  the  overall  width  by  adding  up  the  node  widths  in  each  row  in  the 
estimated  layout  and  using  the  width  of  the  widest  row. 

Step  3.  The  third  step  is  adding  in  the  area  due  to  the  wires.  A  very  simple 
formula  based  on  the  number  of  rows  and  columns  in  the  design  was  used  to  calculate 
an  estimate  of  the  required  wire  area.  More  rows  required  more  horizontal  wires  and 
more  columns  required  more  vertical  wires.  This  formula  produced  estimates  within 
20%.  (The  main  intent  here  is  to  show  that  it  is  possible  to  derive  good  area  estimates 
using  machine  learning  techniques.  We  expect  that  better  models  of  wire  area  will 
produce  better  results,  but  the  learning  results  will  be  the  same.) 

3.4  Summary 

This  chapter  first  presented  some  alternative  methods  to  modeling  layout  tools. 
It  then  showed  how  modeling  the  input-to-output  relationships  that  result  from  us¬ 
ing  layout  tools  was  the  best  choice.  This  chapter  went  on  to  describe  a  high-level 
overview  and  solution  architecture  for  this  method.  This  chapter  described  the  so¬ 
lution  architecture  as  a  generic  algorithm.  This  algorithm  uses  a  machine  learning 
system  to  model  the  relationships  and  to  apply  the  model  to  target  designs.  To  clar¬ 
ify  the  discussion,  this  chapter  also  described  some  examples  from  using  this  method 
to  model  an  actual  layout  tool.  However,  it  should  be  noted  here  that  the  intent 
of  this  chapter  was  to  introduce  and  describe  our  modeling  method  at  a  high  level 
without  going  into  detail.  The  next  chapter  expands  this  discussion  of  our  method 
by  describing  an  actual  implementation  in  detail. 
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4.0  AN  IMPLEMENTATION  USING  MACHINE  LEARNING 


This  chapter  describes  in  detail  an  implementation  of  the  area  and  delay  esti¬ 
mation  method.  It  does  this  by  following  the  solution  architecture  and  algorithm 
described  in  the  previous  chapter.  This  chapter  shows  how  this  method  is  used  to 
model  an  actual  layout  tool  using  real  training  designs  and  real  test  designs.  The 
entire  method  is  described  from  beginning  to  end,  starting  with  building  a  training 
set  and  finishing  with  estimating  area  and  delay.  This  chapter  also  discusses  the 
reasons  behind  the  implementations  chosen  for  each  of  the  procedures  in  the  method. 

Section  4.1  discusses  the  first  procedure  in  using  machine  learning  techniques  to 
model  the  input-to-output  relationships  from  using  layout  tools:  building  a  training 
set.  This  section  discusses  the  philosophy  behind  training  sets  by  describing  their 
function  and  basic  requirements.  This  section  also  gives  an  example  of  how  an  actual 
training  set  is  built  using  real  training  designs.  Section  4.2  describes  how  the  learning 
system  analyzes  a  training  set  and  produces  a  set  of  rules  that  models  the  target 
layout  tool’s  relationships.  This  section  also  describes  the  user-specified  biases  for 
the  learning  system.  An  example  of  using  the  learning  system  to  analyze  an  actual 
training  set  accompanies  and  illustrates  these  descriptions.  Section  4.3  describes  how 
the  model  (set  of  rules)  is  applied  to  real  designs.  This  section  shows  how  and  why 
the  set  of  rules  maps  the  relevant  design  features  into  layout  concepts.  To  clarify 
the  discussion,  this  section  shows  how  the  set  of  rules  are  applied  to  an  actual  test 
design  to  produce  a  set  of  layout  concepts.  Section  4.4  describes  how  the  estimator 
uses  these  layout  concepts  to  provide  delay  estimates.  This  section  shows  how  the 
layout  concepts  provide  wirelength  information  so  the  estimator  can  calculate  the 
delay  between  gates  in  the  design.  This  section  also  describes  how  the  estimator 
performs  a  search  of  all  of  the  input-to-output  paths  in  a  real  test  design  looking 
for  the  critical  path.  Delay  estimates  for  the  five  test  designs  are  presented  along 
with  their  percent  differences  from  the  actual  delay  values.  Section  4.5  describes  how 
area  estimates  are  derived  from  the  layout  concepts.  This  section  first  shows  how 
the  estimator  uses  layout  concepts  to  generate  an  estimated  layout.  Then  it  shows 


51 


how  the  estimator  uses  this  estimated  layout  to  produce  height  and  width  estimates. 
This  is  illustrated  by  using  the  estimator  on  a  set  of  layout  concepts  from  an  actual 
test  design.  The  height  and  width  estimates  for  each  of  the  test  designs  are  presented 
along  with  their  percent  differences  from  the  actual  height  and  width  values. 

The  target  layout  tool  in  all  of  these  examples  is  the  Artistll  layout  tool.  This 
tool  is  described  in  Section  2.2.2.  The  example  training  set  is  built  using  five  training 
designs:  a  Tbit  adder,  8-bit  adder,  12-bit  adder,  16-bit  adder,  and  32-bit  adder. 
The  test  designs  we  describe  in  detail  to  illustrate  applying  the  model  include  a  Tbit 
adder,  12-bit  adder,  24-bit  adder,  16-bit  absolute  value  operator,  and  4-bit  multiplier. 
The  test  designs  include  adders  because  the  training  set  is  built  from  adder  designs. 
The  other  test  designs,  the  absolute  value  operator  and  multiplier,  are  used  to  show 
how  an  adder-based  training  set  can  work  for  other  types  of  designs. 

4.1  Building  a  Training  Set 

A  training  set  forms  the  foundation  for  the  set  of  rules  that  models  the  target 
layout  tool.  The  learning  system  used  in  this  research,  due  to  its  general  nature, 
also  requires  a  description  of  the  domain  to  provide  a  context  to  the  information  in 
the  training  set,  but  only  the  information  in  this  training  set  is  used  to  produce  a 
set  of  rules.  Therefore,  the  training  set  must  be  complete  in  that  it  provides  enough 
information  so  a  representative  model  is  generated.  If  it  were  possible  to  have  an 
unlimited  training  set  where  all  possible  training  instances  were  included,  and  all 
possible  information  about  the  designs  were  included  in  the  training  instances,  then 
the  set  of  rules  would  provide  a  perfect  model.  However,  in  the  real  world,  perfection 
is  not  possible  because  of  natural  limitations.  It  is  not  possible  to  have  an  unlimited 
training  set  that  includes  all  information  on  the  designs.  It  is  possible,  on  the  other 
hand,  to  have  a  training  set  that  adequately  describes  a  sufficient  sample  set  of 
designs.  The  training  instances  include  enough  information  about  the  features  to 
yield  accurate  rules  and  the  training  set  has  enough  training  instances  to  yield  a 
sufficient  set  of  rules.  At  this  point  the  training  set  is  said  to  be  complete. 

This  research  uses  the  performance  of  the  resultant  model  as  a  measure  of  training 
set  completeness.  (This  is  briefly  described  in  Section  3. 3. 2.1.)  A  training  set  is 
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considered  complete  when  a  substantial  increase  to  its  size  does  not  increase  the 
performance  of  the  model.  (Here  we  define  size  as  a  function  of  the  amount  of 
information  in  each  training  instance  and  the  total  number  of  training  instances.) 
An  initial  training  set  is  built  and  the  performance  of  the  model  on  test  designs  is 
measured.  Then  more  information  is  added  to  the  training  set  and  another  model 
is  built.  The  performance  of  this  model  is  measured  and  compared.  This  process  is 
repeated  until  the  performance  of  the  model  starts  to  decrease  or  stops  increasing. 

For  this  research,  we  decided  to  start  out  using  as  much  relevant  information  as 
possible  in  each  training  instance.  The  purpose  of  a  training  instance  is  to  describe 
various  features  about  an  entity  and  to  show  the  correct  classification  of  that  entity.  In 
this  research,  the  entity  is  a  pair  of  gates  and  the  features  describe  the  individual  gates 
in  the  pair  and  their  relationships  to  each  other.  The  features  themselves  are  based 
on  graph  features  such  as  connectivity,  topology,  and  individual  node  attributes.  The 
Partial  Domain  Model  (PDM)  shown  in  Figure  14  shows  all  of  the  relevant  features. 
(Figure  14  is  the  same  as  Figure  8,  but  is  shown  here  again  for  easy  reference.) 
Because  we  are  interested  in  learning  layout  concepts  for  the  relative  placement  of 
two  gates,  each  training  instance  describes  features  about  the  individual  gates  and 
how  they  relate  to  each  other.  Therefore,  this  initial  PDM  lists  features  that  describe 
individual  node  attributes  and  relational  attributes  between  the  two  gates. 

The  first  line  shows  the  seven  relative  placement  concepts  we  want  the  learning 
system  to  learn  about.  (These  relative  placement  concepts  are  based  on  the  Manhat¬ 
tan  distance  between  gates.)  These  are  the  same  concepts  described  in  Section  3. 3. 2.1. 
More  concepts  could  be  used  to  handle  up  to  several  hundred  positions  if  necessary. 
However,  when  the  learning  system  has  to  learn  a  large  number  of  concepts,  the 
learning  process  degrades  considerably. 

To  improve  the  learning  process,  it  is  better  to  learn  fewer  concepts.  Fewer 
concepts  means  there  are  more  training  instances  for  each  concept.  More  training 
instances  provide  more  evidence  for  either  supporting  belief  or  disbelief  in  the  rule. 
More  evidence  improves  the  learning  results.  Having  fewer  concepts  also  means  that 
each  concept  must  have  a  wider  scope  in  order  to  cover  the  same  number  of  cases.  For 
example,  we  could  use  20  concepts  to  cover  20  relative  placements  with  each  concept 
covering  only  one  relative  placement  (i.e.,  a  concept  of  apart-5  could  indicate  the  two 
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near_s,  near.d,  close.l,  close_2,  far_l,  far_2,  way.out.  I  concepts 
log_conn:  yes,  no.  com_dest:  yes,  no. 

com_src:  yes,  no.  al_a2:  >,  =,  <. 

hl_h2:  >,  =,  <.  wl_w2:  >,  =,  <. 


nil_ni2 

V 

II 

A 

nxl_nx2: 

>,  =,  <. 

ntl_nt2 

V 

II 

A 

funl_fun2: 

>  ! ” • 

area_l : 

continuous . 

height. 1 : 

continuous . 

width. 1 

continuous. 

delay. 1 : 

continuous . 

funct.l 

ai2 , ai3 , ai4 , aoi21 , aoi21 1 , aoi22 , 

il. 

oai21 ,oai211,oai22,oi2,oi3,oi4. 

inputs. 1 : 

continuous. 

ext.l : 

continuous . 

int.l: 

continuous . 

llevl.l 

continuous . 

llev2.1: 

continuous . 

llev3_l 

continuous . 

llev4.1: 

continuous . 

21evl_l 

continuous . 

21ev2.1: 

continuous . 

21ev3_l 

continuous . 

21ev4.1: 

continuous . 

area_2 : 

continuous . 

height .2: 

continuous . 

width_2 

continuous . 

delay.2: 

continuous . 

funct_2 

ai2,ai3,ai4,aoi21 ,aoi211,aoi22. 

il, 

oai21,oai211,oai22,oi2,oi3,oi4. 

inputs.2: 

continuous . 

ext_2: 

continuous . 

int_2 : 

continuous . 

llevl_2 

cont inuous . 

llev2.2: 

continuous . 

llev3_2 

continuous . 

llev4.2: 

continuous . 

21evl_2 

continuous . 

21ev2_2; 

continuous . 

21ev3_2 

continuous . 

21ev4.2: 

continuous . 

Figure  14  Initial  Partial  Domain  Model 

gates  are  placed  five  positions  away  from  each  other,  where  a  position  is  a  possible 
gate  location).  An  alternative  is  to  use  fewer  concepts,  but  cover  more  relative 
placements  with  each  one.  For  example,  we  could  cover  up  to  20  relative  placements 
with  only  five  concepts  if  each  concept  covered  four  relative  placements  (i.e.,  a  concept 
of  apart.5.8  could  indicate  the  two  gates  are  placed  either  five,  six,  seven,  or  eight 
positions  apart).  We  have  covered  the  same  number  of  relative  placements,  but  used 
only  five  concepts. 

In  this  implementation,  we  used  seven  concepts  to  cover  all  of  the  relative  place¬ 
ments.  The  concept  of  near^s  indicates  the  two  gates  are  located  next  to  each  other 
on  the  same  row.  The  concept  of  neared  indicates  the  two  gates  are  next  to  each 
other,  but  on  different  rows.  In  other  words,  one  gate  is  right  above  or  below  the 
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other.  The  concept  of  close^l  indicates  the  two  gates  are  located  two  or  three  posi¬ 
tions  apart  from  each  other,  on  the  same  row  or  on  different  rows.  The  concept  of 
close^2  indicates  the  gates  are  located  four  or  five  positions  apart.  The  concept  of 
farA  indicates  the  two  gates  are  six  or  seven  positions  apart.  The  concept  of  farA 
indicates  the  two  gates  are  eight  or  nine  positions  apart.  The  concept  of  way-out 
indicates  the  two  gates  are  10  or  more  positions  apart. 

It  turns  out  that  by  using  design  features  based  on  graph  applications,  the  gate 
pairs  within  10  positions  of  each  other  produce  more  rules  than  gates  further  apart. 
For  example,  as  described  later  in  this  section,  the  learning  system  produced  35  rules 
based  on  the  training  set  in  this  implementation.  There  were  1144  training  instances 
in  this  training  set,  with  585  of  them  having  the  concept  of  way-out.  This  means 
more  than  half  of  the  training  set  had  relative  placements  of  10  or  more  positions 
apart.  However,  only  eight  of  the  35  rules  predicted  a  concept  of  way-out.  As  a 
comparison,  seven  rules  were  produced  for  the  concept  of  nears  based  on  only  82 
training  instances.  Nine  rules  were  produced  for  the  concept  of  close.  1  based  on  123 

training  instances.  Five  rules  were  produced  for  the  concept  of  close-2  based  on  130 

training  instances,  and  six  rules  were  produced  for  the  concept  of  far.l  based  on  119 

training  instances.  No  rules  were  produced  for  the  remaining  concepts  of  near.d  and 

far-2  based  on  a  total  of  105  training  instances.  This  shows  that  a  large  number 
of  training  instances  supporting  a  particular  concept  does  not  necessarily  produce 
more  rules.  In  addition,  because  rules  are  only  produced  when  there  is  substantial 
evidence  in  the  data  to  support  the  rules,  this  shows  there  is  more  evidence  linking 
the  design  features  to  concepts  for  relative  placements  less  than  10  positions  apart. 
Therefore,  we  decided  to  concentrate  on  learning  about  the  gate  pairs  that  are  within 
10  positions  of  each  other.  The  seven  concepts  handle  this  sufficiently. 

Another  reason  we  can  concentrate  on  gate  pairs  within  10  positions  of  each 
other  is  that  gates  further  away  have  very  little  effect  on  each  other  as  far  as  relative 
placement.  Of  course,  if  they  are  connected,  the  wirelength  would  start  to  get  long 
and  the  delay  of  the  driving  gate  would  increase.  However,  using  rules  to  predict  the 
relative  placement  of  gates  based  on  gate  pairs  with  large  distances  between  them  is 
not  necessary.  In  practice,  each  gate  in  the  pair  is  usually  connected  to  another  gate 
which  is  closer.  A  relative  placement  based  on  a  pairing  with  this  other  gate  would 
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provide  better  results  because  the  gates  are  closer  and  have  more  effect  on  each  other. 

The  results  in  Chapter  5  shows  that  this  approach  holds,  even  for  large  designs. 
When  gate  pairs  that  exhibit  the  proper  features  are  placed  according  to  one  of  these 
seven  concepts,  gates  that  should  be  further  apart  end  up  as  such.  This  approach 
also  allows  us  to  handle  large  designs  as  well  as  small  ones. 

The  ability  to  represent  the  necessary  layout  information  with  only  seven  concepts 
is  also  an  example  of  how  “black  and  white”  classifications  are  not  necessary  in  this 
method.  Classifying  a  gate  pair  as  being  either  two  or  three  positions  apart  introduces 
a  gray  area  into  the  modeling  process.  However,  it  turns  out  the  difference  between 
these  positions  is  small  enough  that  it  has  little  effect  on  the  area  and  delay  estimates. 

It  should  be  noted  here  that  overlapping  the  concepts  (having  more  than  one 
concept  include  the  same  number  of  positions  apart)  will  introduce  confusion  into 
the  learning  process.  When  the  learning  system  analyzes  the  training  set,  it  needs 
to  have  one  and  only  one  correct  answer  for  each  of  the  training  instances.  Having 
more  than  one  correct  answer  would  produce  conflicting  rules.  In  turn,  the  accuracy 
of  the  area  and  delay  estimates  would  decrease. 

The  other  information  in  the  PDM  describes  attributes  about  the  two  gates  in 
the  gate  pair.  The  first  10  features  describe  how  the  two  gates  relate  to  each  other. 
The  feature  log-Conn  indicates  if  the  two  gates  are  logically  connected.  A  gate  pair  is 
considered  to  be  logically  connected  if  one  gate  in  the  pair  either  provides  an  input 
to  the  other  gate  or  receives  an  output  from  the  other  gate.  The  feature  corri-dest 
indicates  if  the  two  gates  have  a  common  destination  (i.e.,  both  gates  provide  inputs 
to  the  same  gate).  The  feature  comsrc  indicates  if  both  gates  have  inputs  from  a 
common  source  (i.e.,  both  gates  share  the  output  of  the  same  gate).  The  features 
al^a2,  hlJi2,  and  wLw2  indicate  if  the  first  gate  has  more,  equal,  or  less  area, 
height,  and  width  than  the  second  gate  in  the  pair.  The  features  ntLnt2,  nxLnx2, 
and  niLni2  indicate  if  the  first  gate  has  more,  equal,  or  less  internal,  external,  and 
total  inputs  than  the  second  gate.  The  feature  funLfun2  indicates  if  the  functions 
of  the  two  gates  are  the  same  or  different.  Each  of  these  10  features  is  a  symbolic 
type.  The  first  three  features  have  the  symbols  yes  and  no  as  admissible  values.  The 
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next  six  features  have  the  symbols  >,  =,  and  <  as  admissible  values.  The  feature 
funl-fun2  has  the  symbols  ==  and  !—  as  admissible  values. 

The  remaining  features  describe  attributes  about  the  individual  gates.  Any  fea¬ 
ture  with  a  “-1”  extension  describes  an  attribute  about  the  first  gate  listed  in  the 
training  instance.  Any  feature  with  a  “^2”  extension  describes  an  attribute  about  the 
first  gate  listed  in  the  training  instance.  The  features  area,  height,  width,  and  delay 
indicate  the  area,  height,  width,  and  intrinsic  delay  of  the  individual  gate.  These 
features  are  a  continuous  type.  Any  real  number  is  admissible.  The  feature  funct 
indicates  the  function  of  the  individual  gate.  This  feature  is  a  symbolic  type  with  all 
of  the  possible  primitive  gates  as  admissible  values.  The  features  inputs,  ext,  and  int 
indicate  how  many  total  inputs,  external  inputs,  and  internal  inputs  the  individual 
gate  has.  These  three  features  are  a  continuous  type. 

The  next  eight  features  describe  the  topology  of  the  design  in  relation  to  each  of 
the  individual  gates  in  the  training  instance.  We  wanted  to  know  if  gates  that  are 
one  or  two  levels  away  affected  the  relative  placement  of  the  two  gates  in  the  gate 
pair.  These  topology  features  are  of  the  form  xlevy  where  x  is  the  number  of  levels 
away  and  y  is  the  number  of  inputs.  The  continuous  values  for  these  features  indicate 
how  many  gates  that  are  x  levels  away  have  y  inputs.  For  example,  if  the  feature 
2lev3-l  has  a  value  of  five,  then  two  levels  away  from  the  first  gate  in  the  gate  pair 
there  are  a  total  of  five  gates  with  three  inputs  each.  The  topology  description  was 
set  up  this  way  so  the  rules  could  indicate  if  there  were  first  or  second  order  effects 
from  other  gates  in  the  layout  on  the  two  gates  in  the  training  instance. 

Now  that  we  have  described  all  of  the  relevant  information  in  a  individual  training 
instance,  the  next  consideration  in  a  training  set  is  the  number  of  training  instances. 
The  goal  here  as  well  is  to  have  a  large  enough  training  set  so  the  performance  of 
the  model  is  not  increased  by  a  substantial  increase  to  the  training  set.  The  initial 
training  set  was  made  up  of  all  possible  gate  pairs  from  the  five  adders  used  as  training 
designs.  Four  of  these  adders  have  the  most  common  bit  widths:  4,  8,  16,  and  32. 
One  of  these  adders  has  an  uncommon  bit-width  of  12  just  to  include  some  evidence 
for  odd  designs.  Each  design  requires  a  different  size  layout  with  different  physical 
dimensions.  These  designs  also  exhibit  a  wide  variety  of  the  features  listed  in  the 
PDM.  These  five  designs  provided  107,690  training  instances.  However,  the  version  of 
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the  learning  system  we  were  using  was  unable  to  handle  this  many  training  instances, 
so  we  had  to  remove  the  largest  training  design,  the  32-bit  adder.  This  reduced  the 
number  of  training  instances  down  to  25,069.  It  is  better  to  remove  an  entire  set  of 
training  instances  based  on  one  training  design  than  to  remove  only  some  of  them. 
This  is  because  each  training  design  adds  its  own  unique  set  of  characteristics  and 
this  set  would  be  incomplete  if  all  training  instances  are  not  included.  As  it  turns 
out,  this  reduced  number  of  training  instances  was  more  than  enough  to  produce 
a  good  model.  Later  in  this  section,  we  describe  how  this  training  set  was  further 
reduced  to  an  even  more  manageable  size. 

The  remainder  of  this  section  describes  how  to  build  a  training  set  by  following 
the  procedure  and  actually  building  one.  The  first  step  in  building  a  training  set 
is  selecting  a  set  of  training  designs.  Our  example  training  set  now  consists  of  four 
training  designs:  a  Tbit  adder,  8-bit  adder,  12-bit  adder,  and  16-bit  adder.  The 
reasons  for  these  designs  is  explained  in  the  previous  paragraph. 

The  second  step  is  pre-processing  each  training  design  into  a  format  the  target 
layout  tool  can  handle.  For  the  Artistll  layout  tool,  a  design  must  be  in  the  GLUE 
format.  As  shown  in  the  solution  architecture  in  Figure  15,  we  start  out  with  a  VHDL 
description  (.VHDL)  of  each  training  design.  (Figure  15  is  the  same  as  Figure  6.  It 
is  copied  here  for  easier  reference.)  We  started  with  VHDL  descriptions  because 
we  want  the  designs  to  be  in  a  general,  technology  independent,  and  layout  tool 
independent  format.  In  fact,  we  were  able  to  use  these  same  designs  as  training 
designs  to  model  the  other  layout  tool,  TimberWolf.  Each  training  design  is  compiled 
into  an  Intermediate  VHDL  Format  (.IVF)  using  VCOMP.  A  tool  called  IVF2GLUE 
translates  each  of  the  .IVF  files  into  a  GLUE  format  (.GLU)  file.  Once  all  training 
designs  are  in  the  GLUE  format,  we  can  proceed  to  the  next  step. 

The  third  step  is  producing  actual  layouts  for  each  of  the  training  designs.  For 
the  training  designs  in  this  example,  we  ran  Artistll  using  10,000  iterations.  This 
number  of  iterations  is  based  on  information  from  the  author  of  Artistll,  R.  Owens, 
for  the  number  of  gates  in  these  designs.  These  are  small  designs  so  it  did  not  take 
too  long  to  get  the  layouts.  The  execution  times  are  shown  in  Table  4.  For  each 
training  design,  Artistll  produced  a  Magic  layout  file  (.MAG)  and  a  grouping  file 
(.GRP).  (The  MAGIC  layout  file  is  only  used  for  test  designs  to  get  the  actual  area 
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Delay  Area 


Area  &  Delay 
Estimates 


Design  Actual 

Pre-processing  Layout 

(Common  to  all  paths)  (CPU  Hours) 


Learning  Applying 

Rules  Rules 

(Off-line)  (CPU  Seconds) 


Figure  15  Solution  Architecture 


and  provide  electrical  information  to  IRSIM  to  get  the  actual  critical  path  delay.)  The 
grouping  file  provides  information  on  the  relative  placements  of  the  primitive  gates 
in  the  design.  Because  each  design  is  a  different  size,  Artistll  produced  different  size 
layouts.  Being  different  in  this  manner,  each  of  these  layouts  provides  different  layout 
information  to  the  training  set.  The  number  of  rows  and  columns  for  these  layouts 
are  also  shown  in  Table  4.  We  wanted  to  keep  the  layouts  as  square  as  possible,  with 
an  equal  or  close  to  equal  number  of  rows  and  columns  when  possible. 


Table  4  Execution  Times  for  Artistll  on  Training  Designs 


Design 

Name 

Number 
of  Gates 

Number  of 

Transistors 

Number 
of  Rows 

Number 
of  Columns 

Execution 
Time  (CPU  sec) 

add4 

43 

146 

8 

8 

840 

adds 

95 

330 

8 

16 

2278 

addl6 

199 

698 

16 

16 

8278 

add32 

407 

1434 

32 

16 

18735 
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The  fourth  step  is  transforming  data  from  the  training  designs  and  their  layouts 
into  a  training  set  format.  To  make  it  easier  to  parse  through  the  training  design 
description,  we  used  a  tool  called  Clusterll  to  flatten  the  GLUE  description.  (The 
GLUE  file  can  be  hierarchical  and  very  difficult  to  parse  directly.)  As  described 
in  Section  3. 3. 2.1,  the  primary  function  of  Clusterll  is  to  produce  an  initial  gate 
clustering  to  improve  the  layouts.  However,  Clusterll  can  also  produce  a  flattened 
file  (.FLA)  description  of  the  design’s  GLUE  file.  We  did  not  use  Clusterll  to  improve 
the  layouts  because  we  wanted  to  model  Artistll  strictly  by  itself.  In  addition,  we 
manually  checked  the  input  GLUE  files  and  the  resultant  flattened  files  to  ensure 
that  no  optimization  or  other  processing  occurred. 

A  software  tool  called  Match  (written  for  this  research)  parses  through  each  train¬ 
ing  design’s  flattened  description  file  to  get  information  on  each  of  the  primitive  gates 
and  the  overall  connectivity  and  topology  of  the  design.  This  information  allows  the 
Match  tool  to  generate  a  graphical  representation  of  the  design.  The  graphical  rep¬ 
resentation  is  stored  in  an  adjacency  matrix.  This  matrix  shows  which  gate  pairs 
are  connected.  We  can  start  at  any  gate  and  traverse  through  this  matrix  to  get  the 
topology  of  the  design  in  relation  to  that  gate.  (We  also  use  this  matrix  to  get  the 
critical  path  delay.  This  is  explained  later  in  Section  4.4.) 

Information  on  each  of  the  gates  is  stored  in  a  record.  This  information  includes 
the  gate’s  name  (identifier),  function,  number  of  external  inputs  (from  the  outside 
world),  number  of  internal  inputs  (from  within  the  design),  height,  width,  area, 
intrinsic  delay  (for  both  rising  and  falling  output  transitions),  and  whether  or  not 
the  gate  is  an  input  gate  or  output  gate.  An  input  gate  has  external  inputs  and  an 
output  gate  has  an  output  to  the  outside  world.  In  addition,  in  the  form  of  linked 
lists,  this  record  holds  information  about  the  gates  that  are  one  and  two  levels  away. 
This  is  the  topology  information  described  in  the  PDM  file  in  Figure  14. 

Each  gate’s  height,  width,  area,  and  intrinsic  delay  come  from  a  physical  charac¬ 
teristics  library.  This  library  holds  physical  information  for  each  primitive  gate  the 
layout  tool  can  handle.  The  rest  of  the  information  comes  from  the  design’s  flattened 
description  file.  For  each  training  instance,  the  gate  description,  connectivity,  and 
topology  information  provides  values  for  the  features  describing  each  gate  in  the  gate 
pair  and  for  describing  how  the  gates  relate  to  each  other.  The  Match  tool  parses 
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the  .GRP  file  to  obtain  the  relative  placement  information  from  the  design’s  layout. 
This  information  provides  the  correct  concept  for  each  training  instance. 

Now  that  we  have  built  a  training  set,  we  need  to  determine  if  it  is  complete  or  if 
it  needs  more  or  less  information.  This  requires  having  the  learning  system  train  on  it 
and  produce  a  set  of  rules.  This  also  requires  using  these  rules  on  several  test  designs 
and  measuring  the  performance.  We  did  this  for  the  training  set  using  all  25,069 
training  instances  and  all  features  described  by  the  PDM  file  in  Figure  14.  We  used 
the  model  on  the  test  designs  shown  in  Table  5.  (This  table  just  shows  the  design 
name,  function,  and  the  number  of  gates  in  the  design.  The  results  are  described  in 
Section  5.2).  It  turns  out  the  few  rules  based  on  the  topology  features  had  very  low 
certainty  factors  (<  0.400),  so  we  eliminated  them.  In  addition,  eliminating  these 
features  from  the  training  set  did  not  affect  the  model’s  performance.  The  individual 
gate  features  did  not  produce  rules  with  high  certainty  factors  either,  so  we  eliminated 
these  features.  Once  again,  eliminating  these  features  had  no  impact  on  the  model’s 
performance. 


Table  5  Set  of  Test  Designs 


Design 

Name 

Function 

Number 
of  Gates 

add4 

4-bit  adder 

43 

addl2 

12-b  adder 

147 

add24 

24-b  adder 

303 

absl6 

16-b  absolute  value 

208 

inc4 

4-b  incrementer 

23 

inc8 

8-b  incrementer 

51 

incl6 

16-b  incrementer 

107 

sub4 

4-b  subtracter 

48 

sub8 

8-b  subtracter 

104 

subl6 

16-b  subtracter 

216 

addsub4 

4-b  adder /sub 

64 

addsubS 

8-b  adder/sub 

132 

addsublG 

16-b  adder/sub 

268 

mult2 

2-b  multiplier 

40 

mult4 

4-b  multiplier 

163 

mult  6 

6-b  multiplier 

389 

mult  8 

8-b  multiplier 

711 

mult  16 

16-b  multiplier 

2959 
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We  were  also  able  to  reduce  the  number  of  training  instances  in  the  training  set. 
Initially,  we  generated  training  instances  for  all  possible  gate  pairs.  We  thought  it 
would  be  important  to  learn  how  each  gate  related  to  each  of  the  other  gates.  How¬ 
ever,  after  applying  the  model  to  the  test  designs  in  Table  5,  we  found  we  only  needed 
to  know  relative  placement  concepts  for  gate  pairs  that  were  logically  connected.  In 
the  case  of  delay,  the  estimator  calculates  delay  as  a  function  of  wirelength  between 
two  gates.  If  these  two  gates  are  not  logically  connected,  there  is  no  wire  and  no 
delay.  The  wirelength  comes  from  the  predicted  relative  placement  concept  for  the 
two  gates,  therefore  we  do  not  need  to  know  the  relative  placement  of  unconnected 
gates  for  calculating  delay.  In  the  case  of  area,  we  have  the  information  we  need 
from  a  list  of  all  logically  connected  gate  pairs.  This  is  because  this  list  must  include 
each  gate  in  the  design  in  at  least  one  gate  pair.  So  each  gate  must  have  a  relative 
placement  concept  associating  it  with  at  least  one  other  gate  in  the  estimated  layout. 
By  not  generating  training  instances  for  gate  pairs  that  were  not  logically  connected, 
we  reduced  the  size  of  the  training  set  immensely. 

For  these  reasons,  we  were  able  to  reduce  the  PDM  file  shown  in  Figure  14  down  to 
the  PDM  file  shown  in  Figure  16.  We  were  also  able  to  reduce  the  number  of  training 
instances  from  25,069  down  to  only  616.  At  this  point,  we  were  able  to  reinstate 
the  training  instances  from  the  32-bit  adder.  Because  we  now  consider  only  logically 
connected  gate  pairs,  the  32-bit  adder  provides  only  528  training  instances,  instead 
of  82,621.  Adding  the  32-bit  adder  information  increased  the  number  of  training 
instances  in  the  training  set  to  1144. 

The  first  line  in  the  PDM  file  in  Figure  16  shows  we  used  the  same  relative 
placement  concepts  as  in  Figure  8.  However,  we  eliminated  the  log^conn  feature 
because  we  only  consider  logically  connected  gate  pairs.  The  next  two  features, 
com^dest  and  comsrc  remained  intact.  For  the  other  features,  only  the  admissible 
values  were  changed.  Instead  of  looking  for  the  symbolic  values  of  <,  and  >,  we 
now  look  for  ==  (equal)  or  !=  (not  equal).  This  made  the  features  more  general 
and  less  dependent  on  the  order  of  gates  in  the  gate  pair.  The  last  feature  name  was 
added  to  identify  the  gates  in  the  gate  pair.  It  has  an  admissible  feature  of  ignore 
because  it  is  only  an  identifier  feature  and  we  do  not  want  it  used  in  any  rules.  Now 
that  we  have  explained  how  we  scaled  down  to  this  reduced  PDM  file  by  eliminating 
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features  and  generalizing  others,  we  can  describe  actual  training  instances. 

near_s,  near_d,  close. 1,  close_2,  far_l,  far_2,  way.out.  |  concepts 

com.dest:  yes,  no. 

com.src:  yes,  no. 

al_a2:  ==,  !=. 

hl_h2:  ==,  !=. 

wl_w2:  ==,  !=. 

ntl_nt2;  ==,  !=. 

nxl_nx2:  ==,  !=. 

nil_ni2:  ==,  !=. 

funl_fun2:  ==,  !=. 

name ;  ignore . 

Figure  16  Reduced  Partial  Domain  Model 

Figure  17  shows  some  training  instances  based  on  the  reduced  PDM  file.  These 
training  instances  come  from  the  first  training  design,  the  4-bit  adder.  Each  training 
instance  lists  the  values  in  the  same  order  as  the  features  in  the  PDM  file.  The  next 
to  last  value  is  the  gate  pair  identifier.  The  first  number  is  the  number  of  the  first 
gate  in  the  pair  and  the  second  number  is  the  number  of  the  second  gate.  The  gate 
numbers  are  based  on  the  order  of  gates  in  the  flattened  description  file.  The  first 
gate  described  in  this  file  is  gate  0.  These  numbers  are  also  used  in  the  grouping  file 
to  indicate  the  gate’s  position  in  the  layout.  The  last  value  in  the  training  instance 
is  the  correct  concept  for  that  training  instance.  For  example,  the  first  training 
instance  shows  that  gates  0  and  16  (the  first  two  gates  in  the  training  design  that  are 
logically  connected)  do  not  have  a  common  destination  or  common  source.  (The  gate 
identifiers  come  from  the  order  of  gates  in  the  training  design’s  flattened  description 
file.  Gate  0  is  the  first  gate  listed  and  gate  16  is  the  17th  gate  listed.)  The  remaining 
values  in  this  training  instance  show  the  two  gates  are  equal  in  area,  height,  width, 
number  of  total  inputs,  number  of  external  inputs,  and  number  of  internal  inputs. 
They  also  have  the  same  function.  The  layout  concept  for  these  two  gates  is  close.2. 
This  indicates  that  in  the  actual  layout  for  the  4-bit  adder,  gates  0  and  16  are  four  or 
five  positions  apart.  The  rest  of  the  training  set  (not  shown  here  due  to  its  length) 
includes  training  instances  from  the  4-bit  adder,  8-bit  adder,  12-bit  adder,  16-bit 
adder,  and  32-bit  adder.  The  Match  tool  produces  a  set  of  training  instances  for  each 
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of  these  designs,  one  at  a  time.  The  training  set  is  a  concatenation  of  all  the  training 
instances. 


no , no ,==,==,==,==,==,==  j  ==, 0/ 16 , close_2 . 
yes,no,==, !=,!=,!=,!=,==, ! =,0/22, close. 1 . 
no,no,==, !=,!=,!=,==,!=, !=,0/23,far_l. 
no, no, !=,!=,!=,!=,!=, !=, !=,0/25,n6ar_s . 
no , no ,==,==,==,==,==,==,==, 1/23 , f  ar_ 1 . 
no, no, ==,!=,!=, !=,==, !=,!=, 1/24, near.s. 
yes, no, ==,==, ==,==, !=,!=, ! =,1/31, near. d. 
no , no ,==,==,==,==,==,==,==, 1/32 , close.2 . 
no, no, !=,!=,!=,!=, !=,==, !=,l/34,far_2. 
no, no, ==,==,==,==,==,==,==, 2/32, far.l. 
no, no, ==,!=,!=,!=,==, !=, !=, 2/33, close. 1. 


Figure  17  Example  Training  Instances 

The  fifth  step  is  to  build  other  training  sets  if  we  need  to  model  the  target  layout 
tool  using  different  configurations.  For  this  example,  we  only  wished  to  model  Artistll 
at  10,000  iterations  so  we  do  not  need  to  build  other  training  sets. 

This  section  described  the  reasons  behind  a  training  set  and  how  to  actually 
build  one.  The  next  section  describes  how  and  why  the  learning  system  analyzes  the 
example  training  set  to  produce  a  set  of  rules. 

4.2  Analyzing  the  Training  Set 

The  learning  system  produces  a  set  of  rules  by  learning  about  relationships  be¬ 
tween  the  features  and  the  concepts  in  the  training  set.  Therefore,  the  first  step  in 
analyzing  the  training  set  is  determining  if  there  are  any  relationships  between  the 
design  features  and  the  layout  concepts.  These  relationships  are  represented  by  rules. 
The  learning  system  finds  these  rules  by  performing  a  general-to-specific  search  of 
the  rule  space  for  the  rules  describing  each  concept.  The  rule  space  for  each  concept 
is  structured  as  a  tree.  Each  concept  has  its  own  tree.  The  root  of  this  tree  is  the 
most  general  rule:  every  instance  belongs  to  the  concept.  The  tree  branches  out  as 
the  rules  are  specialized.  The  rules  are  specialized  by  either  adding  a  conjunct  to  the 
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left-hand  side  or  making  an  existing  conjunct  more  specific.  Restricting  a  numeric 
range  or  replacing  a  value  with  a  more  specific  value  makes  conjuncts  more  specific. 

Because  we  are  using  certainty  factors  as  search  parameters,  the  rules  are  made 
more  specific  until  rules  that  satisfy  the  certainty  factor  threshold  are  found.  These 
rules  are  considered  acceptable  rules  and  are  added  to  the  rule  set.  (Acceptable  rules 
are  those  rules  that  satisfy  the  user-specified  biases  for  the  learning  system.)  When 
all  acceptable  rules  have  been  found,  the  rules  are  evaluated  by  applying  them  to  the 
training  set.  This  determines  the  coverage  and  the  performance.  (This  performance 
is  really  the  apparent  error  rate  as  described  in  Section  2.3.)  In  this  context,  coverage 
means  what  percentage  of  the  training  instances  match  the  left-hand  side  of  one  or 
more  rules.  A  coverage  of  100%  means  that  each  training  instance  fires  at  least  one 
rule.  The  performance  indicates  how  well  the  rules  predict  the  correct  concepts. 
A  performance  of  75%  means  three-fourths  of  the  predicted  concepts  are  correctly 
predicted. 

For  our  example,  we  started  out  with  a  certainty  factor  threshold  of  0.9.  (The 
author  of  the  Rule  Learner  system,  F.J.  Provost  indicated  that  starting  at  0.9  and 
decreasing  by  0.1  would  provide  the  best  results  for  an  implementation  using  certainty 
factors.)  A  threshold  value  of  0.9  means  only  those  rules  whose  certainty  factor  equals 
or  exceeds  0.9  are  acceptable  rules.  However,  no  acceptable  rules  were  found  for  a 
threshold  of  0.9.  We  then  lowered  the  threshold  to  0.8.  This  time  eight  rules  were 
found.  However,  these  eight  rules  only  covered  5.4%  of  the  training  instances  and 
predicted  correct  concepts  for  only  4.5%  of  those  covered.  We  wanted  to  have  all 
(100%)  training  instances  covered,  so  we  lowered  the  threshold,  this  time  to  0.7. 
The  learning  system  found  21  acceptable  rules  that  covered  25.2%  of  the  training 
instances,  with  12.8%  correct.  So  we  lowered  the  threshold  to  0.6%.  This  produced 
35  rules  with  100%  coverage  and  33.0%  of  the  training  instances  correctly  predicted. 
For  explanation  purposes.  Figure  18  shows  five  of  the  rules  from  this  set. 

The  first  rule  says  that  if  the  two  gates  do  not  have  a  common  destination  and 
do  have  a  different  number  of  internal  inputs,  then  their  predicted  concept  is  nears. 
The  statistics  in  this  rule  indicate  87%  of  the  training  instances  with  the  concept  of 
nears  satisfy  these  conditions  (no  common  destination,  different  number  of  internal 
inputs).  The  statistics  also  indicate  only  52%  of  the  training  instances  with  concepts 
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(com_dest  no)  (nil_ni2  !=)  ==>  near.s 

{  pos=  0.87,  neg=  0.52,  cf=  0.623 

p=71,  n=547,  tp=82,  tn=1062  } 

(coin_dest  no)  (funl_fun2  !=)  ==>  near.s 

{  pos=  0.88,  neg=  0.57,  cf=  0.603 

p=72,  n=603,  tp=82,  tn=1062  } 

(hl_h2  !=)  (nil_ni2  !=)  ==>  close.l 

{  pos=  0.86,  neg=  0.56,  cf=  0.602 

p=106,  n=574,  tp=123,  tn=1021  } 

(com.dest  no)  (nxl_nx2  !=)  (funl_fun2  !=)  ==>  close_l 

{  pos=  0.51,  neg=  0.33,  cf=  0.602 

p=63,  n=339,  tp=123,  tn=1021  } 

(com.dest  yes)  (al_a2  !=)  ==>  close_2 

■C  pos=  0.15,  neg=  0.05,  cf=  0.735 

p=19,  n=48,  tp=130,  tn=1014  } 

(nxl_nx2  ==)  (funl_fun2  !=)  ==>  far_l 

■C  pos=  0.44,  neg=  0.28,  cf=  0.603 

p=52,  n=288,  tp=119,  tn=1025  } 

(com_dest  yes)  (nxl_nx2  !=)  ==>  way.out 

{  pos=  0.10,  neg=  0.03,  cf=  0.791 

p=58,  n=14,  tp=585,  tn=559  } 


Figure  18  Example  Set  of  Rules  No.  2 

other  than  nears  satisfy  these  conditions.  So  this  rule  has  a  .623  certainty  factor 
according  to  Equation  3-2.  It  should  be  noted  here  that  this  certainty  factor  is  not 
a  probability  of  the  rule’s  truth.  It  is  a  value  that  indicates  the  learning  system’s 
belief  in  the  rnle  (based  on  the  training  set)  and  should  only  be  used  to  rank  order 
this  rnle  with  other  rules  indicating  the  same  concept. 

The  second  step  in  analyzing  the  training  set  is  taken  if  no  relationships  are  found. 
We  would  have  to  adjust  some  of  the  learning  system’s  biases  and  try  analyzing  the 
training  set  again.  In  the  example  case,  we  used  a  beam  width  of  50  for  the  beam 
search  strategy.  We  also  allowed  6  conjuncts  in  the  left-hand  side  of  the  rnles.  This 
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means  that  up  to  6  features  (two-thirds  of  the  possible  features)  could  be  listed 
as  conditions.  These  bias  settings  produced  35  acceptable  rules.  If  no  rules  were 
produced,  we  could  have  used  a  wider  beam  width  and/or  allowed  more  conjuncts. 
Then,  if  the  learning  system  still  produced  no  rules,  we  would  have  to  build  and  use 
another  training  set. 

The  third  step  is  building  a  set  of  rules  (the  model)  describing  the  relationships 
that  were  found.  The  learning  system  does  this  automatically.  The  set  of  rules  we 
obtained  for  our  example  is  a  model  of  Artistll  at  10,000  iterations  for  different  size 
adder  designs.  Applying  these  rules  to  other  adder  designs  is  explained  in  the  next 
section. 


4.3  Applying  the  Model 

In  this  section,  we  explain  in  detail  how  we  use  the  layout  tool  model  to  obtain 
layout  information.  We  applied  the  model  to  five  test  designs  to  illustrate  how  to 
apply  the  model.  (In  the  next  two  sections,  we  use  these  same  five  designs  to  illustrate 
how  to  evaluate  the  performance  of  the  model.)  Test  designs  are  treated  the  same  as 
target  designs,  except  they  have  associated  layout  information  obtained  from  actual 
layouts  as  a  reference.  The  test  designs  we  used  from  Table  5  are  the  12-bit  adder, 
24-bit  adder,  16-bit  absolute  value  operator,  and  4-bit  multiplier.  (We  included  the 
4-bit  multiplier  to  test  the  hypothesis  that  rules  learned  from  adders  are  applicable 
to  multipliers  as  well.) 

The  main  idea  behind  the  layout  tool  model  is  that  it  represents  the  target  layout 
tool  in  a  form  a  high-level  synthesis  system  can  use  to  get  area  and  delay  estimates. 
The  synthesis  system  applies  the  model  to  a  target  design  to  get  layout  information 
from  which  the  system  can  derive  area  and  delay  estimates.  To  do  this,  the  model  is 
based  on  how  the  target  layout  tool  places  logically  connected  gate  pairs  relative  to 
each  other.  When  applied  to  a  target  design,  the  model  produces  a  list  of  predicted 
relative  placements  for  the  logically  connected  gates.  As  described  in  Section  4.5 
and  Section  4.4,  this  list  provides  placement  information  for  estimating  area  and 
wirelength  information  for  estimating  delay. 


67 


Now  that  we  have  explained  the  theory  behind  applying  the  model,  we  can  illus¬ 
trate  with  actual  examples.  The  first  step  in  applying  the  model  is  pre-processing 
the  test  designs  into  a  format  so  we  can  apply  the  rules.  This  format  is  the  same 
as  the  training  set  format,  but  without  the  correct  concepts.  Therefore,  we  have 
to  first  transform  the  test  designs  from  their  VHDL  descriptions  into  GLUE  format 
descriptions.  Then  we  use  Clusterll  to  get  the  flattened  file  description.  (Once  again, 
Clusterll  does  not  change  anything  while  it  is  flattening  the  GLUE  file  description.) 
Pre-processing  the  designs  in  this  way  keeps  our  method  consistent  because  the  Match 
tool  can  treat  test  designs  the  same  as  training  designs. 

However,  because  these  are  test  designs,  we  used  Artistll  to  produce  reference 
layouts  for  each  one.  We  used  the  same  number  of  iterations  of  Artistll  (10,000)  as 
we  did  for  the  training  designs.  Table  6  shows  the  five  test  designs  and  the  number 
of  rows  and  columns  in  the  actual  layout  for  each  one.  We  tried  to  keep  the  layout  as 
square  as  possible  with  an  equal  or  close  to  equal  number  of  rows  and  columns.  We 
relaxed  this  restriction  for  the  multiplier  to  see  if  it  affected  the  estimates  and  found 
out  that  it  had  no  effect. 

Table  6  Test  Design  Layout  Information 


Design 

Name 

Number 
of  Gates 

Number  of 
Transistors 

Number 
of  Rows 

Number 
of  Columns 

add4 

43 

146 

8 

8 

addl2 

147 

514 

16 

16 

add24 

303 

1066 

16 

32 

absl6 

208 

638 

16 

16 

mult4 

163 

550 

8 

32 

We  used  Magic  to  obtain  the  actual  height  and  width  of  each  layout,  and  IRSIM 
to  obtain  the  worst-case  critical  path  delay.  This  information  will  be  used  to  evaluate 
the  performance  of  both  the  learning  system  and  the  area  and  delay  estimator. 

The  Match  tool  processes  the  flattened  descriptions  for  each  of  the  test  designs 
in  the  same  manner  as  the  training  designs.  The  output  is  a  test  data  file  that  looks 
exactly  like  a  training  set,  but  without  the  concept  information.  Eigure  19  shows  a 
sample  set  of  data  from  the  4-bit  multiplier.  The  Match  tool  also  produces  a  file  that 
describes  the  physical  limitations  for  the  test  design’s  estimated  layout.  This  is  the 
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no^no,  “>  “»  "3  ,0/114. 

no, no, ==,==,==,==,==,==, ==,1/107. 
yes, no, ==,!=,!=,!=,==,!=,!=, 8/11. 
no,yes, !=, !=, !=,!=,==,!=,!=, 8/137. 


no , no ,==,==,==,==,==,==,==, 8/ 152 . 
no, no, ==,==,==,==,==,==, ==,9/11. 
no,no,==, !=,!=,!=,==,!=, !=, 9/158. 
no,no,==, !=,!=,!=,==,!=,!=, 10/12. 
no , no ,==,==,==,==,==,==,==, 10/ 1 15 . 
no, no, !=,!=,!=, !=,==, !=,!=, 11/137. 
no,no,==, !=,!=,!=,==,!=, !=,12/13. 
no,no,==, !=,!=,!=,!=,!=,!=, 141/159. 
no, no, ==,!=,!=,!=,!=,!=,!=, 142/143. 


Figure  19  Example  Data  from  4-Bit  Multiplier 

.STAT  file  shown  on  the  right  hand  side  of  Figure  15.  The  Match  tool  produces  the 
.STAT  file  by  first  asking  the  user  how  many  rows  to  use  in  the  estimated  layout.  (In 
the  case  of  the  test  designs,  we  used  the  same  number  of  rows  as  in  the  actual  layouts 
to  provide  a  valid  comparison.)  The  Match  tool  uses  this  number  of  rows  and  the 
number  of  gates  in  the  test  design  to  determine  the  required  number  of  columns.  The 
number  of  rows  and  number  of  columns  determines  the  size  of  the  estimated  layout 
and  the  physical  limitations.  The  physical  limitations  are  based  on  the  fact  there  can 
only  be  so  many  gate  pairs  that  are  next  to  each  other  on  the  same  row,  or  next  to 
each  other  on  different  rows,  or  so  many  positions  apart.  The  physical  limitations  file 
indicates  how  many  of  each  layout  concept  is  possible  in  the  estimated  layout.  This 
physical  limitations  file  is  used  in  the  next  step. 

The  second  step  is  applying  the  model  to  each  of  the  test  design’s  data  files 
to  obtain  predicted  layout  information.  As  described  in  Section  3. 3. 3.1,  one  of  the 
routines  in  the  learning  system  is  called  the  Inference  Engine.  This  routine  applies 
a  set  of  rules  to  a  data  file  and  produces  a  list  of  predicted  concepts.  The  list  of 
predicted  concepts  includes  each  gate  pair  and  their  predicted  relative  placement 
concept. 

For  this  research  and  this  example,  the  Inference  Engine  applies  the  rules  as 
follows.  First,  a  list  of  all  concepts  is  created  for  each  gate  pair.  For  our  example. 
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each  gate  pair  has  a  list  of  the  seven  concepts  shown  in  the  PDM  file  in  Figure  16. 
Initially,  each  concept  is  assigned  a  certainty  factor  of  0.0.  The  Inference  Engine 
takes  the  first  rule  from  the  set  of  rules  and  begins  applying  it  to  the  data  instances 
for  each  gate  pair.  If  a  rule  fires  for  a  gate  pair,  the  rule’s  certainty  factor  is  combined 
with  the  predicted  concept’s  current  certainty  factor  for  the  gate  pair  according  to 
Equation  4-1,  where  X  and  Y  are  the  certainty  factors  to  be  combined.  (This  is 
the  same  as  Equation  3-3.  It  is  repeated  here  for  easy  reference.)  This  combined 
certainty  factor  becomes  the  current  certainty  factor  for  that  concept. 


cf^t{X,Y)=X  +  Y{l^X)  (4-1) 

To  illustrate,  we  will  apply  a  couple  of  rules  from  Figure  18  to  some  of  the  gate 
pairs  from  Figure  19  and  show  how  the  certainty  factors  combine.  The  first  rule  in 
Figure  18  predicts  a  concept  of  nears  for  any  gate  pairs  that  do  not  have  a  common 
destination  and  have  an  unequal  number  of  internal  inputs.  This  rule’s  certainty 
factor  is  0.623.  From  the  data  in  Figure  19,  the  first  gate  pair  to  fire  the  rule  is 
9/158.  The  current  certainty  factor  for  the  concept  nears  in  this  gate  pair’s  list  of 
concepts  is  combined  with  the  rule’s  certainty  factor.  The  current  certainty  factor  is 
0.0  and  the  rule’s  certainty  factor  is  0.623,  so  the  combined  certainty  factor  is  0.623. 
After  combining  the  certainty  factors,  the  list  of  concepts  for  the  gate  pair  9/158 
looks  like  Figure  20.  This  rule  is  applied  to  the  rest  of  the  gate  pairs  and  if  any  of 
them  fire  the  rule,  their  concept  lists  will  be  updated  in  the  same  way. 

Gate  Pair:  9/158 
near_s:  0.623 
near_d:  0.0 
close_l:  0.0 
close_2:  0.0 
far_l:  0.0 

far_2:  0.0 

way_out :  0.0 


Figure  20  Concept  List  No.  1 
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The  Inference  Engine  then  takes  the  second  rule  and  applies  it  to  all  gate  pairs. 
The  second  rule  from  Figure  18  predicts  a  concept  of  near^s  for  any  gate  pairs  that 
do  not  have  a  common  destination  and  do  not  have  the  same  function.  Gate  pair 
9/158  is  one  of  the  pairs  that  fires  this  rule  so  its  concept  list  is  updated.  The  rule’s 
certainty  factor  of  0.603  is  combined  with  the  current  certainty  factor  of  0.623  for 
the  nears  concept.  This  produces  a  combined  certainty  factor  of  0.850  according  to 
Equation  4-1.  Now  the  concept  list  for  gate  pair  looks  like  Figure  21. 

Gate  Pair:  9/158 
near_s:  0.850 
near_d:  0.0 
close.l:  0.0 
close_2:  0.0 
far_l:  0.0 

far_2;  0.0 
way_out :  0.0 

Figure  21  Concept  List  No.  2 

After  applying  the  second  rule  to  the  remaining  gate  pairs  and  updating  their 
concept  lists  as  necessary,  the  Inference  Engine  takes  the  third  rule  and  applies  it. 
This  rule  predicts  a  concept  of  clost-l  with  a  certainty  factor  of  0.602  for  any  gate 
pair  whose  gates  have  different  heights  and  a  different  number  of  internal  inputs. 
Gate  pair  9/158  is  one  of  the  gate  pairs  that  fires  this  rule,  so  its  concept  list  is 
updated.  The  current  certainty  factor  for  the  close-1  concept  is  0.0,  so  combining  it 
with  the  rule’s  certainty  factor  produces  a  new  certainty  factor  of  0.602.  The  concept 
list  for  gate  pair  9/158  now  looks  like  Figure  22. 

This  rule  application  process  is  continued  until  all  rules  have  been  applied  to  all 
gate  pairs.  When  this  process  is  finished,  each  gate  pair  will  have  a  list  of  predicted 
concepts  with  their  final  certainty  factors.  Each  gate  pair’s  list  of  concepts  is  then 
rank  ordered  by  certainty  factor  in  descending  order.  The  concept  with  the  highest 
certainty  factor  becomes  the  first  choice  for  the  gate  pair.  The  concept  with  the  next 
highest  certainty  factor  becomes  the  second  choice,  and  so  on.  The  final  concept  list, 
rank  ordered  by  certainty  factor,  for  gate  pair  9/158  is  shown  in  Figure  23.  This 
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Gate  Pair:  9/158 
near_s:  0.850 
near_d:  0.0 
close_l:  0.602 
close_2:  0.0 
far_l:  0.0 

far_2:  0.0 

way_out :  0.0 


Figure  22  Concept  List  No.  3 

list  indicates  that  for  gate  pair  9/158,  the  concept  near^s  is  the  first  choice,  with 
the  concept  farA  a  very  close  second.  In  fact,  either  concept  could  be  used  as  the 
first  choice  because  they  are  so  close.  In  ties  like  this,  the  concept  representing  the 
smallest  distance  between  the  two  gates  is  listed  first,  because  the  cost  function  of 
the  target  layout  tool  is  based  on  finding  the  minimal  area. 


Gate  Pair:  9/158 
near_s:  0.991 
far_l:  0.990 

close.l:  0.975 
way_out :  0 . 605 
near_d:  0.0 
close_2:  0.0 
far_2:  0.0 


Figure  23  Final  Concept  List 

At  first  glance,  the  second  choice  of  farA  after  the  first  choice  of  near.s  does  not 
make  sense  spatially.  However,  as  it  turns  out,  the  set  of  rules  does  not  contain  any 
rules  predicting  the  concept  of  nearA.  This  is  because  for  this  particular  training 
set,  there  is  not  enough  evidence  supporting  any  rules  for  the  concept  of  near  A.  In 
addition,  based  on  evidence  in  the  training  set,  the  gates  in  a  gate  pair  with  features 
like  gate  pair  9/158  want  to  be  placed  next  to  each  other,  but  if  that  is  not  possible, 
a  placement  further  away  is  warranted.  Otherwise,  if  we  were  to  always  use  the 
next  spatial  concept  as  a  second  choice,  we  would  violate  what  was  learned  from  the 
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training  set.  This,  in  turn,  would  go  against  the  goal  of  this  research:  using  machine 
learning  to  model  the  input-to-output  relationships  from  using  layout  tools. 

Now  that  we  have  rank-ordered  each  gate  pair’s  concept  list,  the  entire  list  of 
logically  connected  gate  pairs  is  rank  ordered  in  descending  order  using  the  certainty 
factor  from  the  gate  pair’s  first  choice  concept.  The  gate  pair  with  the  highest 
certainty  factor  for  the  first  choice  concept  is  now  the  first  gate  pair  in  the  list.  The 
gate  pair  with  the  next  highest  certainty  factor  for  the  first  choice  is  the  second  in 
the  list,  and  so  on. 

Starting  at  the  top  of  this  list,  each  gate  pair  is  assigned  its  first  choice  concept, 
unless  that  concept  is  unavailable  due  to  physical  limitations.  (This  means  it  is  not 
possible  to  place  the  two  gates  the  appropriate  number  of  positions  apart.)  Each  time 
a  concept  is  assigned,  the  physical  limitation  file  is  updated.  When  that  concept  is  no 
longer  possible  for  the  remaining  gate  pairs  in  the  predicted  layout,  any  unassigned 
gate  pairs  in  the  list  having  that  concept  as  their  first  choice  must  now  use  their 
second  choice,  if  it  is  available.  This  is  another  example  of  a  gray  area  in  the  method. 
It  is  not  necessary  to  assign  a  gate  pair  its  first  choice  of  relative  placement  concepts. 
For  example,  if  the  physical  limitations  allow  only  so  many  concepts  of  nears,  the 
gate  pairs  with  the  highest  certainty  factors  for  this  concept  should  have  it  assigned 
first.  When  the  available  number  of  this  concept  runs  out,  then  the  gate  pair’s  second 
choice  can  be  assigned.  Having  a  gray  area  gives  the  method  flexibility  when  assigning 
concepts  so  physical  limitations  of  the  domain  can  be  satisfied. 

The  assignment  process  is  repeated  until  all  gate  pairs  have  an  assigned  concept. 
The  end  result  is  a  file  that  lists  all  gate  pairs  with  their  predicted  concepts.  This 
file  is  rank  ordered  so  the  gate  pairs  near  the  top  have  been  assigned  concepts  with 
the  best  certainty  factors.  The  gate  pairs  towards  the  end  of  the  list  may  not  have 
been  assigned  their  first  or  second  choice  concepts,  but  the  certainty  factors  are  lower 
towards  the  end  due  to  the  rank  ordering.  This  means  there  is  a  higher  probability 
that  the  first  choice  concepts  may  be  incorrect  as  it  is.  The  physical  limitations  define 
the  bounds  of  the  predicted  layout  and  enhances  the  overall  method. 

If  none  of  the  rules  fire  for  any  of  the  gate  pairs,  the  third  step  is  taken  and  a 
different  layout  tool  model  is  used.  In  our  example,  we  obtained  a  complete  set  of 
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predicted  concepts  for  each  of  the  test  designs.  We  did  not  need  to  use  a  different 
model.  Now  that  we  have  the  predicted  layout  information,  we  are  ready  to  move  on 
to  estimating  area  and  delay. 


4.4  Estimating  Delay 

One  of  the  performance  measures  for  a  digital  design  is  the  execution  speed. 
Designers  and  users  need  to  know  how  fast  a  design  can  process  the  inputs  and 
produce  a  usable  output.  In  following  the  old  adage,  “A  chain  is  as  strong  as  its 
weakest  link”,  we  can  say  a  design  is  as  fast  as  its  slowest  link.  This  means  the 
slowest  path  from  input  to  output  determines  how  fast  the  design  can  execute.  This 
slowest  path  is  called  the  critical  path.  The  delay  of  this  critical  path  is  inversely 
proportional  to  its  speed.  The  lower  the  delay,  the  faster  the  speed,  and  vice  versa. 

The  overall  delay  of  the  critical  path  is  a  sum  of  all  the  delays  between  nodes  on 
the  path.  For  this  research,  we  calculate  the  delays  between  nodes  as  a  function  of  the 
driving  node’s  intrinsic  delay,  the  delay  due  to  the  length  of  wire  on  the  driving  node’s 
output,  and  the  load  that  the  driven  node(s)  place  on  the  driving  node.  The  driving 
node’s  intrinsic  delay  comes  from  a  physical  library  file.  The  load  from  the  driven 
node(s)  is  a  function  of  the  number  of  driven  nodes.  This  comes  from  the  design’s 
netlist.  The  length  of  wire  on  the  driving  node’s  output  comes  from  knowledge  of 
the  design’s  layout.  In  a  synthesis  system  using  the  method  from  this  research,  the 
system  does  not  know  about  the  design’s  actual  layout.  However,  it  does  have  layout 
information  from  applying  the  layout  tool  model.  This  layout  information  is  in  the 
form  of  predicted  relative  placement  concepts  for  all  logically  connected  gate  pairs. 
The  relative  placement  of  two  gates  translates  directly  into  the  length  of  wire  needed 
to  connect  them. 

Therefore,  the  first  step  in  estimating  delay  is  obtaining  the  wirelengths  between 
all  logically  connected  gates.  The  relationship  between  relative  placement  concepts 
and  wirelength  is  shown  in  Table  7.  For  example,  a  relative  placement  concept  of 
close^l  predicts  the  two  gates  as  being  either  two  or  three  positions  apart.  This 
translates  into  a  wirelength  of  about  210  microns.  These  wirelength  relationships 
were  determined  by  performing  a  statistical  analysis  of  several  actual  layouts.  We 
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found  the  average  wirelength  between  two  gates  next  to  each  other  to  be  about  70 
microns.  The  wirelengths  for  gates  placed  further  apart  is  nearly  linear.  Because  we 
are  interested  in  worst  case  delay,  we  used  the  largest  number  of  positions  apart  in 
each  concept  to  determine  the  wirelength. 


Table  7  Relationship  of  Wirelength  to  Relative  Placement 


Relative 

Placement 

Concept 

Number  of 
Positions 
Apart 

Approximate 

Wirelength 

(microns) 

near_s 

1 

70 

near_d 

1 

70 

close_l 

2-3 

210 

close_2 

4-5 

350 

far_l 

6-7 

490 

far_2 

8-9 

630 

way_out 

lOH- 

>700 

Here  again,  the  concept  of  way-Out  has  proven  sufficient  for  gate  pairs  that  are 
further  than  10  positions  apart.  This  concept  is  only  a  categorization  of  these  gate 
pairs.  The  predicted  relative  placement  gives  us  the  number  of  positions  apart  and 
the  estimator  calculates  the  necessary  wirelength.  For  example,  a  gate  pair  that  is 
20  positions  apart  would  have  the  concept  of  way-out,  but  the  wirelength  between 
the  two  gates  would  be  1400  microns  (20  x  70). 

To  obtain  the  wirelength  information,  the  Match  tool  reads  in  the  list  of  gate 
pairs  and  their  relative  placement  concepts.  The  Match  tool  then  translates  each 
relative  placement  concept  into  a  wirelength  value  for  the  gate  pair  according  to 
the  relationships  in  Table  7.  These  wirelength  values  are  used  in  the  next  step  to 
determine  the  delay  between  the  gates  on  the  critical  path. 

The  second  step  in  estimating  delay  is  performing  a  critical  path  analysis.  This 
analysis  is  first  performed  based  on  the  critical  path  having  a  low-high  transition  on 
the  path’s  output.  The  design  is  treated  as  a  forest  of  trees.  Each  output  gate  is  a 
root  for  one  of  these  trees.  The  input  gates  are  the  leaves.  The  estimator  starts  at 
an  output  gate  and  travels  down  the  tree  in  a  depth-first  search  until  it  reaches  an 
input  gate.  The  estimator  then  backtracks  up  the  tree  one  level,  to  its  parent  gate. 
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When  the  estimator  backtracks,  it  calculates  and  stores  the  delay  from  the  child  gate 
to  the  parent  gate.  (This  delay  includes  the  gate’s  intrinsic  delay  from  the  physical 
characteristics  library  and  the  delay  due  to  the  wires  and  interconnect  loading.)  The 
estimator  then  travels  down  the  next  branch,  if  there  is  one.  When  an  input  gate  is 
reached,  the  estimator  backtracks  and  calculates  the  delay.  This  is  repeated  for  all 
branches  from  the  parent  gate.  The  branch  with  the  worse  delay  becomes  part  of  the 
critical  path  for  the  root  output  gate.  The  worse  case  delay  becomes  the  delay  so  far 
for  the  parent  gate.  The  estimator  then  backtracks  up  to  the  next  level,  the  parent 
of  the  parent,  and  calculates  the  delay.  Once  again,  the  estimator  travels  down  each 
of  the  other  branches  until  it  reaches  the  input  gates,  backtracks  and  calculates  the 
delays.  In  a  depth-first  fashion,  the  estimator  searches  the  entire  tree,  calculating 
the  delays  and  keeping  track  of  the  critical  path  as  it  backtracks.  Eventually,  the 
estimator  will  backtrack  all  the  way  back  to  the  output  gate.  When  it  reaches  the 
output  gate,  both  the  critical  path  and  the  critical  path  delay  for  that  output  will 
be  known.  This  process  is  repeated  for  each  tree.  After  all  trees  have  been  searched 
and  all  critical  paths  and  delays  are  known,  the  critical  path  with  the  longest  delay 
becomes  the  critical  path  for  the  test  design. 

As  the  estimator  travels  down  each  tree,  it  assigns  alternating  values  of  1  and  0 
to  the  gates.  In  this  case,  the  estimator  is  calculating  the  critical  path  delay  for  the 
output  gate  having  a  low-high  transition.  So  the  estimator  assigns  the  output  gate  a 
value  of  1.  The  next  gate  down  the  tree  is  assigned  a  value  of  0.  This  is  repeated  until 
every  gate  in  the  tree  has  an  assigned  value.  The  value  assigned  to  a  gate  determines 
what  delay  equation  is  used  for  that  gate.  If  a  value  of  1  is  assigned,  Equation  4-2 
is  used  to  calculate  the  delay.  If  a  value  of  0  is  assigned.  Equation  4-3  is  used.  This 
method  makes  the  critical  path  delay  appear  data  dependent.  Otherwise,  using  the 
same  delay  equation  for  each  gate  produces  a  very  pessimistic  worse  case  delay. 

The  delay  equations  come  from  research  on  the  Tinker  Tool  system.  Equa¬ 

tion  4-2  calculates  the  delay  for  a  gate  having  a  low-high  transition.  Equation  4-3 
calculates  the  delay  for  a  gate  having  a  high-low  transition.  These  equations  use  the 
average  wire  length,  AWL,  and  the  number  of  interconnects,  INC,  to  determine  the 
delays.  The  average  wire  length  determines  the  delay  due  to  all  of  the  wire  connected 
to  the  output  of  the  gate,  and  the  number  of  interconnects  determines  the  delay  due 
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to  the  load  on  the  output  (i.e.,  the  number  of  gates  being  driven  by  this  gate). 

delaj/o-i  =  awl  x  0.0022  +  INC  x  0.3740  (4-2) 

de/aj/i_o  =  AWL  x  0.0010  +  INC  x  0.1217  (4-3) 

Using  these  equations  and  the  predicted  layout  information,  the  estimator  can  find 
the  critical  path.  Figure  24  shows  the  estimated  critical  path  for  the  4-bit  multiplier 
test  design  with  a  low-high  transition  on  the  path’s  output.  Each  line  shows  the  delay 
information  for  one  gate  on  the  path.  The  gate’s  identifier  begins  the  line.  Next  comes 
the  transition  information.  A  “0”  indicates  a  high-low  transition  and  a  “1”  indicates 
a  low-high  transition.  The  value  immediately  following  the  symbol  indicates  the 
time  at  which  the  transition  is  complete.  The  next  value  (in  parentheses)  indicates 
the  elasped  time  for  the  transition  from  beginning  to  end. 

Starting  at  the  bottom  of  the  figure,  the  last  line  shows  the  input  gate  on  the 
critical  path.  For  example,  this  input  gate  is  named  -10  in  the  4-bit  multiplier 

design.  The  input  is  applied  to  this  gate  at  time  0.00.  This  gate  goes  low  at  time 

1.01.  This  is  1.01  ns  after  the  input  is  applied.  The  next  gate,  xlyl,  goes  high  at 
time  6.05.  This  is  5.05  ns  after  gate  -10  goes  low.  This  indicates  a  delay  of  5.05  ns. 
The  critical  path  can  be  traced  by  starting  at  the  bottom  and  working  up  to  the  top 
line.  The  gate  at  the  top  is  the  output  gate  for  the  critical  path.  For  this  design, 
following  the  critical  path  up  to  the  output  gate  indicates  a  overall  delay  of  52.64  ns. 

The  third  step  in  estimating  delay  is  repeating  the  second  step  by  performing 
a  critical  path  analysis  based  on  the  critical  path  having  a  high-low  transition  on 
the  path’s  output.  Performing  two  critical  path  analyses  gave  us  more  flexibility 
when  comparing  to  critical  path  delays  from  the  IRSIM  simulator.  Because  the 
actual  delays  come  from  running  simulations  on  the  test  designs  using  a  set  of  input 
vectors,  the  critical  path  delay  is  input  data  dependent.  In  addition,  the  output  gate 
on  the  critical  path  does  not  always  have  a  high  or  low  transition.  We  needed  a  way 
to  choose  the  appropriate  output  transition  so  we  could  make  valid  comparisons  and 
calculate  the  right  errors.  Figure  25  shows  the  estimated  critical  path  for  the  4-bit 
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product [7]  ->  1  @  52.64  (0.40ns) 

_buf_l.t[0]  ->  0  @  52.24  (0.75ns) 

c4c2  ->  1  @  51.49  (2.58ns) 

addc_bl_d0_cl.addc_bl_d0_cl_s_0._4  ->  0  @  48.91  (1.12ns) 

addc_bl_d0_cl.addc_bl_d0_cl_s_0.g  ->  1  ©  47.79  (3.32ns) 

addc_bl_d0_cl.addc_bl_d0_cl_s_0._l  ->  0  ©  44.47  (1.28ns) 

c4cl  ->  1  ©  43.19  (3.33ns) 

addc_bl_d0_cl.addc_bl_d0_cl_s_l._4  ->  0  ©  39.86  (0.97ns) 

addc_bl_dO_cl.addc_bl_dO_cl_s_l.g  ->  1  ©  38.89  (3.00ns) 

addc_bl_dO_cl.addc_bl_dO_cl_s_l._l  ->  0  ©  35.90  (1.20ns) 
c4c0  ->  1  ©  34.69  (2.91ns) 

addc_bl_d0_cl.addc_bl_d0_cl_s_2._6  ->  0  ©  31.79  (0.94ns) 
c3c0  ->  1  ©  30.84  (2.84ns) 

addc_bl_dO_cl.addc_bl_dO_cl_s_6._4  ->  0  ©  28.00  (1.19ns) 

addc_bl_d0_cl.addc_bl_d0_cl_s_6.g  ->  1  ©  26.81  (3.48ns) 

addc_bl_d0_cl.addc_bl_d0_cl_s_6._0  ->  0  ©  23.33  (1.42ns) 

s2sl  ->  1  ©  21.90  (3.02ns) 

addc_bl_d0_cl.addc.bl_d0_cl_s_7._xor_15.tC0]  ->  0  ©  18.89  (0.70ns) 
addc_bl_d0_cl.addc_bl_d0_cl_s_7._5  ->  1  ©  18.19  (2.24ns) 

addc_bl_dO_cl.addc_bl_dO_cl_s_7._xor_14.tCO]  ->  0  ©  15.95  (0.55ns) 
del  ->  1  ©  15.40  (2.91ns) 

addc_bl_d0_cl.addc_bl_d0_cl_s_10._4  ->  0  ©  12.50  (0.82ns) 

addc_bl_d0_cl.addc_bl_d0_cl_s_10.g  ->  1  ©  11.67  (4.13ns) 

addc_bl_d0_cl.addc_bl_d0_cl_s_10._l  ->  0  ©  7.54  (1.48ns) 

xlyl  ->  1  ©  6.05  (5.05ns) 

_10  ->  0  ©  1.01  (1.01ns) 


Figure  24  Estimated  Critical  Path  for  4-Bit  Multiplier  (Output  High) 

multiplier  with  a  high-low  transition  on  the  path’s  output.  The  overall  delay  for  this 
critical  path  is  44.99  ns. 

For  comparison  purposes,  Figure  26  shows  the  critical  path  that  IRSIM  found 
for  the  4-hit  multiplier  hy  running  a  simulation  using  electrical  information  from  the 
actual  layout.  This  simulation  showed  the  real  critical  path  has  the  output  going  low, 
with  an  overall  delay  of  44.5  ns.  If  we  compare  Figure  26  to  Figure  25  line  by  line,  we 
can  see  the  delay  estimator  tracks  well  with  the  simulation  from  IRSIM.  (Note:  the 
IRSIM  tool  produces  its  critical  path  list  in  reverse  order  from  the  estimated  critical 
path  lists.  However,  each  line  contains  the  same  type  of  data  found  in  each  line  of 
the  estimator’s  critical  path  descriptions.) 
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product [7]  ->  0  @  44.99  (0.30ns) 

_buf_l.tC0]  =  44.69  (1.51ns) 

c4c2  ->  0  (§  43.19  (1.09ns) 

addc_bl_d0_cl.addc_bl_d0_cl_s_0._4  =  42.09  (2.32ns) 

addc_bl_d0_cl.addc_bl_d0_cl_s_0.g  ->  0  @  39.77  (0.92ns) 

addc_bl_d0_cl.addc_bl_d0_cl_s_0._l  =  38.86  (2.78ns) 

c4cl  ->  0  @  36.08  (1.33ns) 

addc_bl_d0_cl.addc_bl_d0_cl_s_l._4  =  34.75  (2.00ns) 

addc_bl_dO_cl . addc_bl_dO_cl_s_l .g  ->  0  @  32.75  (0.77ns) 

addc_bl_dO_cl.addc_bl_dO_cl_s_l._l  =  31.98  (2.61ns) 

c4c0  ->  0  0  29.37  (1.29ns) 

addc_bl_d0_cl.addc_bl_d0_cl_s_2._6  =  28.08  (2.26ns) 

c3c0  ->  0  0  25.82  (1.11ns) 

addc_bl_d0_cl.addc_bl_d0_cl_s_6._4  =  24.71  (2.48ns) 

addc_bl_d0_cl.addc_bl_d0_cl_s_6.g  ->  0  ©  22.22  (0.99ns) 

addc_bl_d0_cl.addc_bl_d0_cl_s_6._0  =  21.23  (3.10ns) 

s2sl  ->  0  ©  18.13  (0.87ns) 

addc_bl_d0_cl.addc_bl_d0_cl_s_7._xor_15.tC0]  =  17.26  (2.83ns) 

addc_bl_d0_cl.addc_bl_d0_cl_s_7._5  ->  0  ©  14.43  (0.56ns) 

addc_bl_d0_cl.addc_bl_d0_cl_s_7._xor_14.t[0]  =  13.87  (2.51ns) 

del  ->  0  ©  11.36  (1.29ns) 

addc_bl_d0_cl.addc_bl_d0_cl_s_10._4  =  10.07  (1.67ns) 

addc_bl_d0_cl.addc_bl_d0_cl_s_10.g  ->  0  ©  8.40  (1.28ns) 

addc_bl_dO_cl .addc_bl_d0_cl_s_10._l  =7.12  (3.13ns) 

xlyl  ->  0  ©  3.99  (1.60ns) 

_10  =  2.38  (2.38ns) 


Figure  25  Estimated  Critical  Path  for  4-Bit  Multiplier  (Output  Low) 

Table  8  shows  the  delay  estimates  we  obtained  for  the  five  test  designs  and  the 
percent  error  between  these  delay  estimates  and  the  actual  delays  from  IRSIM  simu¬ 
lations.  This  table  shows  the  model  performs  quite  well  estimating  delay  in  its  final 
implementation  for  this  example.  This  table  also  shows  that  even  though  the  model 
is  based  on  a  training  set  using  all  adder  designs,  the  model  works  well  on  other  types 
of  adder-based  designs.  The  percent  errors  for  the  five  test  designs  are  all  under  10%. 
(More  results  are  presented  in  Chapter  5.) 
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y_l  ->  1  @  5000.0ns  ,  node  was  an  input 
_10  ->  0  <9  5000.9ns  (0.9ns) 

xlyl  ->  1  @  5005.9ns  (5.0ns) 

addc_bl_d0_cl.addc_bl_d0_cl_s_10._l  ->  0  @  5007.3ns 
addc_bl_d0_cl.addc_bl_d0_cl_s_10.g  ->  1  ©  5009.3ns 
addc_bl_d0_cl.addc_bl_d0_cl_s_10._4  ->  0  ©  5010.4ns 
del  ->  1  ©  5012.5ns  (2.1ns) 

addc_bl_d0_cl.addc_bl_d0_cl_s_7._xor_14.t_0  ->  0  © 
addc_bl_d0_cl.addc_bl_d0_cl_s_7._5  ->  1  ©  5017.2ns 
s2sl  ->  0  ©  5020.4ns  (3.2ns) 

addc_bl_d0_cl.addc_bl_d0_cl_s_6._0  ->  1  ©  5022.0ns 
addc_bl_d0_cl.addc_bl_d0_cl_s_6.p  ->  0  ©  5023.2ns 
addc_bl_d0_cl.addc_bl_d0_cl_s_6._6  ->  1  ©  5024.9ns 
c3c0  ->  0  ©  5027.2ns  (2.3ns) 

addc_bl_d0_cl.addc_bl_d0_cl_s_2._6  ->  1  ©  5029.4ns 
c4c0  ->  0  ©  5031.0ns  (1.6ns) 

addc_bl_dO_cl.addc_bl_dO_cl_s_l._l  ->  1  ©  5032.4ns 
addc_bl_dO_cl.addc_bl_dO_cl_s_l.p  ->  0  ©  5033.4ns 
addc_bl_d0_cl.addc_bl_d0_cl_s_l._6  ->  1  ©  5034.7ns 
c4cl  ->  0  ©  5037.0ns  (2.3ns) 

addc_bl_d0_cl.addc_bl_d0_cl_s_0._l  ->  1  ©  5038.5ns 
addc_bl_dO_cl .addc_bl_d0_cl_s_0.p  ->  0  ©  5039.4ns 
addc_bl_d0_cl.addc_bl_d0_cl_s_0._6  ->  1  ©  5041.2ns 
c4c2  ->  0  ©  5042.3ns  (1.1ns) 

_buf_l.t_0  ->  1  ©  5043.9ns  (1.6ns) 

product_7  ->  0  ©  5044.5ns  (0.6ns) 


(1.4ns) 

(2.0ns) 

(1.1ns) 

5013.5ns  (1.0ns) 
(3.7ns) 

(1.6ns) 

(1.2ns) 

(1.7ns) 

(2.2ns) 

(1 .4ns) 

(1.0ns) 

(1.3ns) 

(1.5ns) 

(0.9ns) 

(1.8ns) 


Figure  26  IRSIM  Critical  Path  for  4-Bit  Multiplier 
4.5  Estimating  Area 

To  keep  the  estimation  process  consistent,  the  area  estimates  are  based  on  the 
same  information  as  the  delay  estimates.  This  information  is  the  list  of  gate  pairs  and 
their  relative  placement  concepts  produced  by  applying  the  model  to  a  target  design. 
The  relative  placement  concepts  indicate  where  the  two  gates  are  placed  in  relation 
to  each  other  and  is  used  to  generate  an  estimated  layout.  This  estimated  layout  is 
not  a  real  layout  nor  should  it  be  misconstrued  as  a  replacement  for  a  real  layout. 
It  only  provides  information  on  the  predicted  placement  of  gates  so  the  height  and 
width  can  be  estimated. 
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Table  8  Delay  Estimates  for  Example  Test  Designs 


Design 

Name 

Number 
of  Gates 

Actual 
Delay  (ns) 

Estimated 
Delay  (ns) 

Percent 

Error 

add4 

43 

19.1 

19.2 

0.54% 

add  12 

147 

61.5 

60.35 

-1.87% 

add24 

303 

134.0 

125.06 

-6.67% 

ahsl6 

208 

109.0 

112.36 

3.08% 

mult4 

163 

44.5 

44.99 

1.11% 

Therefore,  the  first  step  in  estimating  area  is  using  the  list  of  relative  placement 
concepts  to  produce  an  estimated  layout.  As  described  in  Section  3. 3. 3. 3,  the  esti¬ 
mator  (part  of  the  Match  tool)  “grows”  an  estimated  layout.  The  first  gate  from 
the  first  gate  pair  in  the  list  of  predicted  concepts  is  planted  in  the  center  of  the 
estimated  layout.  This  becomes  the  “seed”  gate.  The  second  gate  from  the  first  gate 
pair  is  placed  in  relation  to  the  seed  gate  according  to  the  predicted  concept  for  the 
gate  pair.  In  other  words,  when  the  estimated  layout  is  finished,  this  gate  will  be 
located  the  appropriate  Manhattan  distance  away  from  the  seed  gate.  Due  to  the 
rank  ordering  of  the  gate  pairs  according  to  certainty  factors,  the  first  gate  pair  in 
the  list  has  an  assigned  concept  with  the  best  certainty  factor  of  all.  This  means  this 
gate  pair  has  the  best  chance  of  having  the  correct  relative  placement  concept.  This 
is  why  we  start  at  the  top  of  the  list. 

The  placement  of  these  two  gates  is  shown  in  Figure  27  for  the  d-bit  adder.  The 
first  gate  pair  in  the  list  consists  of  gates  2  and  40.  Gate  2,  the  seed  gate,  is  located 
in  the  middle  of  the  estimated  layout  and  gate  40  is  located  four  positions  away  in 
accordance  with  its  predicted  relative  placement  concept.  (The  nils  are  place  holders.) 
Therefore,  gate  40  has  a  Manhattan  distance  of  four  positions  away  from  gate  2. 

Because  Artistll  layouts  require  the  number  of  rows  and  number  of  columns  to 
be  a  power  of  two,  there  will  always  be  an  even  number  of  rows  and  columns.  This 
means  there  really  is  no  exact  “middle”  position.  So  the  estimator  first  divides  the 
estimated  layout  into  an  upper  half  and  a  lower  half.  For  this  example,  the  upper  half 
consists  of  the  first  four  rows  and  the  bottom  half  consists  of  the  last  four  rows.  The 
estimator  then  divides  the  bottom  half  into  two  quarters.  The  left  quarter  consists 
of  the  first  four  columns,  and  the  right  quarter  consists  of  the  last  four  columns. 
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(add4.est 

(row  rl  nil  nil  nil  nil  nil  nil  nil  nil) 

(row  r2  nil  nil  nil  nil  nil  nil  nil  nil) 

(row  r3  nil  nil  nil  nil  nil  nil  nil  nil) 

(row  r4  nil  nil  nil  nil  nil  nil  nil  g40) 

(row  r5  nil  nil  nil  nil  g2  nil  nil  nil) 

(row  r6  nil  nil  nil  nil  nil  nil  nil  nil) 

(row  r7  nil  nil  nil  nil  nil  nil  nil  nil) 

(row  r8  nil  nil  nil  nil  nil  nil  nil  nil) 

(column  al  rl  r2  r3  r4  r5  r6  r7  r8) 

) 


Figure  27  Estimated  Layout  for  4-Bit  Adder  No.  1 

The  seed  gate  is  planted  in  the  upper  left  position  of  the  lower  right  quarter  of  the 
estimated  layout.  The  seed  gate  could  also  be  planted  in  the  upper  right  position  of 
the  lower  left  quarter,  or  the  lower  right  of  the  upper  left  quarter,  or  even  in  the  lower 
left  of  the  upper  right  quarter.  Any  of  these  four  locations  are  close  to  the  middle  of 
the  estimated  layout,  so  it  does  not  matter. 

Once  the  seed  gate  is  planted,  the  estimator  starts  looking  for  a  location  for  the 
next  gate  at  the  lower  right  corner  of  the  upper  half  of  the  estimated  layout.  In 
this  case,  this  position  satisfied  the  relative  placement  concept  so  gate  40  was  placed 
there.  If  this  location  did  not  satisfy  the  concept,  the  estimator  would  scan  to  the 
left  and  evaluate  each  vacant  (nil)  position.  If  no  acceptable  positions  were  found  in 
that  row,  the  estimator  would  move  up  one  row  and  start  over  from  the  right. 

After  the  seed  gate  and  its  paired  gate  are  placed,  the  list  is  searched  in  descending 
order  for  other  gates  connected  to  the  seed  gate.  These  gates  are  placed  according 
to  their  predicted  concepts.  If  a  gate  cannot  be  placed  in  a  position  that  satisfies 
its  relative  placement  concept,  it  is  placed  in  the  next  best  position.  For  example,  a 
gate  pair  with  a  predicted  concept  of  close^l  means  the  two  gates  should  be  placed 
a  Manhattan  distance  of  four  or  five  positions  apart.  Because  one  of  the  gates  will 
already  be  located  somewhere  in  the  layout,  the  estimator  only  has  to  find  an  empty 
position  for  the  other  gate.  This  other  gate  should  be  located  four  or  five  positions 
away.  If  there  are  no  empty  positions  that  satisfy  this  requirement,  the  requirement 
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is  relaxed  and  the  gate  may  be  located  either  three  or  six  positions  away.  This 
placement  method  tries  to  first  use  what  was  learned  from  the  training  set,  but  if 
this  is  not  possible,  it  then  uses  the  physical  limitations  of  the  layout  as  a  guide.  This 
is  another  example  of  a  gray  area  in  the  method.  A  gate  may  be  placed  close  enough 
to  where  it  needs  to  go  to  satisfy  both  the  model  and  the  physical  limitations. 

In  the  example,  the  next  gate  pair  in  the  list  that  includes  gate  2  includes  gate  41. 
The  relative  placement  concept  for  this  gate  pair  indicates  gate  41  should  be  placed 
two  or  three  positions  away  from  gate  2.  This  is  shown  in  Figure  28. 


(add4.est 

(row  rl  nil  nil  nil  nil  nil  nil  nil  nil) 

(row  r2  nil  nil  nil  nil  nil  nil  nil  nil) 

(row  r3  nil  nil  nil  g41  nil  nil  nil  nil) 

(row  r4  nil  nil  nil  nil  nil  nil  nil  g40) 

(row  r5  nil  nil  nil  nil  g2  nil  nil  nil) 

(row  r6  nil  nil  nil  nil  nil  nil  nil  nil) 

(row  r7  nil  nil  nil  nil  nil  nil  nil  nil) 

(row  r8  nil  nil  nil  nil  nil  nil  nil  nil) 

(column  al  rl  r2  r3  r4  rS  r6  r7  r8) 

) 


Figure  28  Estimated  Layout  for  4-Bit  Adder  No.  2 

Once  all  gates  logically  connected  to  the  seed  gate  have  been  placed,  the  estimator 
goes  back  to  the  top  of  the  list  and  finds  the  next  unplaced  gate.  Because  this  gate 
does  not  share  a  predicted  relative  placement  concept  with  the  seed  gate,  it  can  be 
placed  anywhere  and  not  violate  any  rules.  Therefore,  it  is  placed  in  the  next  available 
position  near  the  middle  of  the  estimated  layout.  In  the  4-bit  adder  example,  gate  2 
is  only  connected  to  gates  32,  33,  40,  and  41.  The  estimator  goes  down  the  list  and 
places  each  of  these  gates.  Then  the  estimator  goes  back  to  the  top  and  looks  for 
the  next  unplaced  gate.  This  is  gate  1.  It  is  not  connected  to  gate  2,  but  it  is  placed 
next  to  gate  2.  The  end  result  of  placing  these  gates  is  shown  in  Figure  29. 

The  placement  process  is  repeated  until  all  gates  have  been  placed.  Figure  30 
shows  the  final  estimated  layout  after  all  gates  have  been  placed.  Once  again,  this 
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(add4.est 

(row  rl 

nil 

nil 

nil 

(row  r2 

nil 

nil 

nil 

(row  r3 

g32 

nil 

nil 

(row  r4 

nil 

nil 

nil 

(row  r5 

nil 

nil 

nil 

(row  r6 

nil 

nil 

nil 

(row  r7 

nil 

nil 

nil 

(row  r8 

nil 

nil 

nil 

(column 

al  rl  r2  r3 

) 


nil  nil  nil  nil  nil) 
nil  nil  nil  g33  nil) 
g41  nil  nil  nil  nil) 
nil  nil  nil  nil  g40) 
nil  g2  gl  nil  nil) 
nil  nil  nil  nil  nil) 
nil  nil  nil  nil  nil) 
nil  nil  nil  nil  nil) 
r4  r5  r6  r7  r8) 


Figure  29  Estimated  Layout  for  4-Bit  Adder  No.  3 

estimated  layout  is  only  a  vehicle  to  estimate  area  based  on  relative  placement  con¬ 
cepts.  It  should  not  be  treated  as  a  real  layout  or  as  a  replacement  for  one. 


(add4.est 

(row 

rl 

nil 

nil 

g37 

g28 

nil  nil  nil  nil) 

(row 

r2 

nil 

gl6 

gll 

g38 

g29  g7  g33  gl3) 

(row 

r3 

g32 

gl2 

g24 

g41 

g5  g9  g25  g42) 

(row 

r4 

g23 

gl9 

g34 

g20 

gl7  g22  g31  g40) 

(row 

r5 

nil 

nil 

nil 

nil 

g2  gl  gl4  gO) 

(row 

r6 

nil 

nil 

nil 

nil 

g4  g8  g6  g3) 

(row 

r7 

nil 

nil 

nil 

nil 

g30  g21  g39  gl5) 

(row 

r8 

nil 

nil 

nil 

nil 

g35  g27  g26  gl8) 

(column 

al  rl  r2  r3 

r4  r5  r6  r7  r8) 

) 


Figure  30  Final  Estimated  Layout  for  4-Bit  Adder 

Other  layout  “growing”  techniques  were  tried,  like  starting  in  one  corner  and 
growing  inward.  We  also  tried  starting  in  the  left-most  position  in  the  middle  row 
and  alternating  the  growing  cycle  between  the  lower  half  of  the  layout  and  the  upper 
half.  The  technique  we  presented  here  gave  the  best  estimates.  This  is  because  every 
row  has  at  least  one  gate,  and  there  is  at  least  one  row  that  is  completely  filled  with 
gates.  This  gives  us  height  and  width  estimates  slightly  on  the  pessimistic  side.  A 
comparison  of  these  estimates  and  the  actual  values  is  presented  later  in  this  section. 
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In  addition,  it  turns  out  that  placing  groups  of  connected  gates  into  and  around 
other  groups  has  a  very  small  effect  on  the  final  area  estimate.  In  other  words,  grow¬ 
ing  clusters  of  connected  gates  on  top  of  clusters  of  other  connected  gates  provided 
good  results.  This  is  because  we  are  interested  in  the  relative  placement  of  logically 
connected  gate  pairs.  If  two  gates  are  not  logically  connected,  we  can  place  them 
anywhere  in  relation  to  each  other,  and  not  violate  any  rules.  In  fact,  in  some  cases, 
doing  this  may  satisfy  some  physical  limitation.  One  factor  that  allows  this  is  that  in 
most  designs,  all  gates  are  indirectly  connected  to  each  other.  This  means  that  any 
given  cluster  usually  has  at  least  one  interconnection  to  another  cluster.  This  inter¬ 
connection  means  there  is  a  predicted  relative  placement  for  the  two  gates  involved, 
one  from  one  cluster  and  one  from  the  other  cluster.  This  relative  placement  helps 
to  determine  how  the  clusters  are  “grown”  on  top  of  each  other. 

An  alternative  to  this  is  to  predict  a  relative  placement  for  all  gate  pairs.  Then 
we  would  know  where  each  gate  should  be  placed  according  to  other  gates  already 
placed.  However,  as  described  in  Section  3.3.2. 1,  the  total  number  of  gate  pairs  for  a 
design  is  of  O(n^),  where  n  is  the  number  of  gates.  This  would  make  it  very  difficult 
to  handle  large  designs,  even  with  only  a  few  thousand  gates. 

Now  that  we  have  generated  an  estimated  layout,  we  can  derive  an  area  estimate. 
There  are  two  components  to  area  estimation:  the  area  of  the  gates  themselves  and 
the  wire  area.  The  estimator  finds  the  area  of  the  estimated  layout  by  finding  the 
area  of  the  gates  first  and  then  adds  in  the  required  wire  area. 

Therefore,  the  second  step  in  estimating  area  is  finding  the  minimum  height  and 
width  of  the  estimated  layout  due  to  just  the  gates  themselves.  First,  the  estimator 
finds  the  tallest  gate  in  each  row.  The  height  of  this  gate  becomes  the  height  of  that 
row.  For  example,  in  the  fourth  row  in  Figure  30,  the  tallest  gate  is  gate  34.  This 
gate  has  a  height  of  73  lambda.  Therefore,  the  height  of  the  entire  row  must  be  at 
least  73  lambda  in  order  to  accomodate  this  gate.  The  heights  of  the  other  gates 
do  not  change.  They  are  shorter  so  they  can  fit  in  this  row  with  no  problem.  Once 
the  minimum  height  of  each  row  is  found,  the  heights  of  all  the  rows  are  summed 
to  produce  a  minimum  height  of  the  estimated  layout.  The  minimum  height  of  the 
estimated  layout  in  Figure  30  is  478  lambda. 
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Next,  the  estimator  calculates  the  width  of  each  row.  In  the  example  in  Figure  30, 
the  third  row  is  the  widest.  It  has  a  width  of  298  lambda.  This  becomes  the  minimum 
width  of  the  estimated  layout. 

The  third  step  is  adding  in  the  wire  area.  For  this  research,  we  used  a  very  simple 
model  of  wiring  area.  The  more  columns  in  a  layout,  the  more  horizontal  wires 
needed.  The  more  rows  in  a  layout,  the  more  vertical  wires  needed.  The  horizontal 
wires  increase  the  height  of  the  layout  and  the  vertical  wires  increase  the  width.  In 
addition,  this  model  takes  into  account  that  densely  populated  layouts  require  extra 
area  for  wires  than  sparsely  populated  layouts.  The  wires  have  more  room  to  travel 
among  the  gates  in  a  sparsely  populated  layout. 

Other  wire  area  models  like  Rent’s  rule  and  the  Left  Edge  Algorithm  do 
not  work  well  for  the  type  of  estimated  layout  used  here.  This  estimated  layout  is 
based  solely  on  the  relative  placement  of  logically  connected  gate  pairs.  It  does  not 
try  to  minimize  a  cost  function  based  on  interconnections  or  wire  area.  Using  a  wire 
area  model  like  Rent’s  rule  or  the  Left  Edge  Algorithm  produced  overly  pessimistic 
wire  areas. 

The  wire  area  model  we  used  is  a  function  of  the  number  of  rows,  columns,  and 
gates  in  the  design.  For  small  designs  with  less  than  100  gates,  the  wire  area  did  not 
contribute  to  the  overall  area.  The  estimated  area  of  these  designs  is  based  solely  on 
the  minimum  height  and  width  due  to  the  gates.  For  designs  between  100  and  2000 
gates,  we  used  Equation  4-4  to  increase  the  height  and  Equation  4-5  to  increase  the 
width. 


htjact  =  NUM^COLS  x  2/100.0  -f  NUM^NODESimm.Q  (4-4) 

wdjact  =  NUM.ROWS  x  4/100.0  4-  NUM.NODESimm.i)  (4-5) 

Equation  4-4  is  based  on  the  premise  that  more  columns  require  more  horizontal 
wires,  and  horizontal  wires  increase  the  height.  It  is  also  based  on  the  premise  that 
a  dense  layout  (more  nodes)  requires  more  area  for  wires  than  a  sparse  layout.  This 
is  reflected  in  the  number  of  nodes  factor.  Equation  4-5  is  based  on  the  premise  that 
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more  rows  require  more  vertical  wires,  and  vertical  wires  increase  the  width.  This 
equation  also  takes  into  account  the  number  of  nodes. 

For  designs  with  more  than  2000  gates  where  the  number  of  columns  was  greater 
than  or  equal  to  the  number  of  rows,  we  used  Equation  4-6  to  increase  the  height 
and  Equation  4-7  to  increase  the  width.  The  height  factor  did  not  change  in  this 
case  (Equation  4-6  is  the  same  as  Equation  4-4).  However,  as  the  size  of  the  design 
increased  and  because  there  are  more  columns  than  rows,  the  width  factor  decreased. 
This  is  reflected  in  Equation  4-7. 

hijact  =  NUM.COLS  X  2/100.0  -f  NUM^NODESim^)^.^  (4-6) 

wd.fact  =  NUM-ROWS  X  4/150.0  (4-7) 

For  designs  with  more  than  2000  gates  where  the  number  of  columns  was  less  than 
the  number  of  rows,  we  used  Equation  4-8  to  increase  the  height  and  Equation  4-9  to 
increase  the  width.  In  this  case  the  height  factor  did  increase,  but  the  width  factor 
is  the  same  as  Equation  4-5. 

htjact  =  NUM^COLS  x  2/60.0  +  NU M.NODESlmm.^  (4-8) 

wdjact  =  NUMMWS  x  4/100.0  -f  NUM^NODES/mOO.O  (4-9) 

These  wire  area  equations  were  derived  from  an  analysis  of  several  different  size 
layouts  with  different  numbers  of  rows  and  columns.  Using  these  equations  for  wire 
area  along  with  the  minimum  height  and  width  of  a  layout  due  to  just  the  gates 
produced  estimates  within  18%  of  actual  values  for  most  designs. 

To  see  how  well  the  layout  tool  model  and  the  area  estimator  performed  on  the 
test  designs,  we  estimated  height  and  width  for  each  one.  Then  we  compared  these 
height  and  width  estimates  to  those  from  the  actual  layouts.  Table  9  shows  the 
height  and  width  estimates  we  obtained  for  the  example  test  designs.  This  table 
shows  the  percent  error  between  these  estimates  and  the  height  and  width  values 
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from  the  actual  layouts.  All  values  except  percent  error  are  expressed  in  lambda 
units  (1  lambda  unit  =  2  microns).  This  table  shows  that  for  these  test  designs, 
the  height  and  width  estimates  are  within  20%  of  the  actual  values.  In  addition, 
this  table  shows  that  the  model  works  well  for  area  estimation  on  designs  other  than 
adders. 


Table  9  Area  Estimates  for  Example  Test  Designs 


Design 

Name 

Number 
of  Gates 

Actual 

Height 

Estimated 

Height 

Percent 

Error 

Actual 

Width 

Estimated 

Width 

Percent 

Error 

add4 

43 

459 

478 

4.14% 

303 

298 

-1.65% 

addl2 

147 

1123 

1314 

17.01% 

891 

979 

9.88% 

add24 

303 

1667 

1593 

-4.44% 

1654 

1950 

17.90% 

absl6 

208 

1283 

1214 

-5.38% 

968 

933 

-3.62% 

mult4 

163 

907 

891 

-1.76% 

1472 

1539 

4.55% 

4.6  Summary 

This  chapter  described  the  details  of  implementing  the  area  and  delay  estimation 
method.  We  used  a  small  set  of  training  designs  based  on  adders  and  a  small  set  of 
test  designs  (adders  and  non-adders)  to  show  how  the  method  works  for  modeling 
a  target  layout  tool  and  using  the  model  to  estimate  area  and  delay.  Using  these 
designs,  we  found  delay  estimates  within  7%  and  area  estimates  within  18%.  We 
found  no  discernable  difference  between  estimating  area  and  delay  for  adders  or  non¬ 
adders,  even  though  the  model  was  based  on  a  training  set  with  all  adders.  This 
is  due  to  the  fact  that  the  non-adder  designs  are  actually  made  up  of  adders  so  an 
adder-based  training  set  produced  good  results.  This  chapter  also  described  how 
we  evaluated  the  performance  of  the  learning  system  and  the  estimator.  The  next 
chapter  describes  the  set  of  experiments  we  conducted  to  evaluate  and  validate  the 
method  using  other  training  designs  and  larger  test  designs. 
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5.0  EXPERIMENTS  AND  RESULTS 


Experimentation  illustrates  the  identification  of  the  general  design  features,  the 
relative  placement  concepts,  the  formulation  of  the  training  set,  the  derivation  of  the 
tool  model,  and  the  application  of  this  model  to  real  world  designs. 

There  are  five  major  sets  of  experiments.  The  first  set  of  experiments  determines 
how  well  the  learning  system  learns  in  this  domain.  This  involves  using  abstract 
designs  as  training  sets  and  “stacking  the  deck”  with  the  concepts  to  be  learned.  This 
means  manually  assigning  a  relative  placement  concept  to  each  training  instance  in 
an  way  such  that  a  known  relationship  exists  between  the  training  instance  and  this 
concept.  This  way,  the  training  sets  are  “stacked”  with  known  relationships  that 
map  the  features  to  the  associated  concepts.  Now  the  task  is  to  see  how  well  the 
learning  system  finds  these  relationships.  As  was  expected,  it  proved  necessary  to 
make  adjustments  to  the  learning  system’s  biases  so  it  could  find  the  relationships 
and  produce  the  appropriate  rules.  As  described  in  Section  3. 3. 2. 2,  some  of  these 
adjustments  include  setting  the  range  of  certainty  for  acceptable  rules  and  setting 
the  maximum  number  of  features  that  may  appear  in  the  left-hand  side  of  each  rule. 

The  second  set  of  experiments  validates  the  approach  of  using  machine  learning 
to  model  layout  tools.  This  involves  using  training  sets  based  on  microprocessor  ALU 
components,  but  the  concepts  are  not  “stacked”.  The  relative  placement  concepts 
are  derived  from  actual  layouts.  The  experiments  in  this  set  address  the  following 
research  issues: 

•  Is  the  machine  learning  method  better  than  randomly  assigning  relative  place¬ 
ment  concepts? 

•  Is  the  machine  learning  method  better  than  using  one  relative  placement  con¬ 
cept  for  all  gate  pairs? 

•  Is  it  possible  to  model  the  target  layout  tool  using  different  configurations 
and/or  execution  parameters? 


89 


•  Can  the  layout  tool  model  handle  a  wide  variety  of  design  sizes? 

•  Do  we  have  a  sufficient  number  of  test  designs? 

•  Is  using  the  model  faster  than  producing  actual  layouts? 

The  third  set  of  experiments  shows  how  well  the  model  works  on  real  world 
benchmark  designs.  The  generalized  set  of  rules  that  model  the  layout  tool’s  input- 
to-output  relationships  was  applied  to  large  and  completely  functional  designs  to 
obtain  area  and  delay  estimates.  These  estimates  were  then  compared  to  area  and 
delay  values  based  on  actual  layouts  of  these  designs  and  a  measure  of  closeness  was 
derived.  This  measure  of  closeness  (percentage  difference)  shows  how  well  the  rules 
model  the  layout  tool  for  non-trivial  designs. 

The  fourth  set  of  experiments  shows  the  generality  of  the  modeling  method.  Parts 
of  the  implementation  were  modified  and  another  layout  tool  was  modeled  using  the 
same  method.  The  required  modifications  are  described  in  detail  and  the  resultant 
models  of  the  two  layout  tools  are  compared  and  evaluated. 

The  fifth  set  of  experiments  shows  where  we  stand  in  relation  to  other  estimation 
techniques.  We  compare  our  approach  using  machine  learning  to  other  methods  of 
estimating  layouts  and  deriving  area  and  delay  estimates. 

All  of  the  experiments  in  this  chapter  (except  for  the  benchmarks)  were  performed 
on  a  Sun  SPARC2  workstation  with  593  Mbytes.  The  benchmark  experiments  were 
performed  on  a  Sun  SPARC20  workstation  with  1737  Mbytes.  The  learning  system, 
Rule  Learner,  is  installed  on  a  DEC  3100  with  65  Mbytes. 

5.1  Learning  Results 

This  section  discusses  results  from  the  first  set  of  experiments.  These  results  were 
obtained  by  using  the  modeling  method  on  the  Artistll  layout  tool  from  the  Keystone 
Tool  Suite. 

The  purpose  of  the  experiments  described  in  this  section  is  to  determine  if  the 
learning  system  is  capable  of  learning  in  the  computer-aided  design  domain.  For  each 
of  these  experiments,  a  small  test  design  was  created  and  its  layout  was  “stacked”. 
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This  means  the  layout  was  not  obtained  from  a  layout  tool.  Instead,  the  gates  were 
placed  manually  according  to  their  connectivity.  This  manual  placement  is  based  on 
the  premise  that  if  the  gates  are  placed  according  to  some  order,  the  learning  system 
should  be  able  to  learn  the  rules  that  represent  this  order. 

One  of  the  test  designs  created  for  this  set  of  experiments  consists  of  four  separate 
circuits.  Each  circuit  contains  seven  gates  for  a  total  of  28  gates  (378  gate  pairs). 
Each  circuit  has  its  own  set  of  inputs  and  only  one  output.  There  are  no  connections 
between  the  four  circuits.  One  circuit  contains  only  3-input  NAND  gates,  another 
contains  only  3-input  NOR  gates,  another  contains  only  4-input  NAND  gates,  and 
the  last  one  contains  only  4-input  NOR  gates.  Figure  31  shows  the  graph  for  the 
3-input  NAND  gate  circuit.  The  graphs  for  the  other  three  circuits  are  identical, 
except  for  the  function  of  the  gates  and/or  the  number  of  inputs.  The  label  inside 
each  node  in  the  figure  is  the  designation  of  the  gate  as  it  is  listed  in  the  flattened 
description  (.fla)  file.  These  designations  are  also  used  in  the  grouping  (.grp)  file  to 
indicate  the  placement  of  each  gate. 


Figure  31  Test  Design  Graph 

The  “stacked”  grouping  file  is  shown  in  Figure  32.  The  designations  in  Figure  31 
correspond  to  the  first  row  in  this  file.  To  “stack”  the  layout,  we  manually  placed 
each  of  the  four  circuits  in  its  own  row.  In  each  row,  we  placed  the  circuit’s  external 
output  gate  in  the  middle.  We  placed  the  two  gates  that  provide  inputs  to  this  gate 
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one  on  each  side.  We  placed  the  remaining  four  gates  on  the  ends  of  the  row,  two  on 
each  end. 


(ts3stk 

(row  rl  g21  g20  gl8  g27  gl9  g22  g23  nil) 
(row  r2  gl6  gl7  gl3  g26  gl2  gl4  gl5  nil) 
(row  r3  glO  gll  g7  g25  g6  g8  g9  nil) 

(row  r4  g5  g4  gl  g24  gO  g2  g3  nil) 
(column  al  rl  r2  r3  r4) 

) 


Figure  32  “Stacked”  Placement  File 

Using  this  placement  gives  each  of  the  four  rows  two  pairs  of  gates  (eight  pairs 
total)  that  are  near  each  other  and  share  a  common  destination.  These  are  the  gates 
on  the  ends  of  the  rows.  In  Figure  32,  these  are  gate  pairs  g20-g21  and  g22-g23. 
Gate  pair  g20-g21  share  gate  gl8  as  a  common  destination  and  gate  pair  g22-g23 
share  gate  gl9  as  a  common  destination.  We  designed  this  arrangement  to  produce 
a  rule  that  says  eight  of  the  gate  pairs  that  are  near  share  a  common  destination. 
There  are  also  four  pairs  of  gates  on  each  row  that  are  near  each  other  and  logically 
connected.  In  Figure  32,  these  are  gate  pairs  gl8-g20,  gl8-g27,  gl9-g22,  and  gl9-g27. 
We  designed  this  placement  to  produce  rules  that  deal  with  the  logically  connected 
feature.  This  “stacked”  placement  also  causes  every  gate  on  the  same  row  to  have  the 
same  function,  area,  and  delay  and  causes  gates  on  different  rows  to  have  different 
functions,  areas,  and  delays.  We  expected  the  learning  system  to  produce  rules  that 
reflect  this. 

Using  this  setup,  we  produced  two  training  sets.  One  of  the  training  sets  used  all 
features  as  described  in  Section  3.3.2. 1  while  the  other  used  only  relational  features. 
We  did  this  to  see  if  individual  gate  features  produced  better  rules  that  covered  more 
gate  pairs  than  relational  features  alone.  Using  only  relational  features,  the  learning 
system  produced  six  rules.  These  are  shown  in  Figure  33.  (We  did  not  calculate  the 
certainty  factors  because  we  stacked  the  placement.  In  addition,  because  the  training 
set  was  stacked,  these  rules  have  very  low  negative  coverage.)  These  six  rules  covered 
a  little  more  than  half  of  the  total  number  of  training  instances  (192  out  of  378). 
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As  expected,  the  common  destination  feature  showed  up  in  one  of  the  rules  for  the 
near  concept.  This  rule  indicated  that  eight  out  of  the  24  total  gate  pairs  with  the 
concept  of  near  shared  a  common  destination.  However,  it  also  indicated  that  four 
out  of  the  354  gate  pairs  that  are  not  near  share  a  common  destination.  We  examined 
the  layout  and  found  this  was  true.  In  each  of  the  four  rows,  the  gates  on  each  side 
of  the  middle  gate  share  a  common  destination,  but  are  not  near  each  other. 

Found  6  Good  Rules 

192  examples  covered  (out  of  378) 

(log_conn  in)  ==>  near 
{  pos=  0.33,  neg=  0.00,  cf=  0.000 
p=8,  n=0,  tp=24,  tn=354  } 

(com.dest  yes)  ==>  near 
{  pos=  0.33,  neg=  0.01,  cf=  0.000 

p=8,  n=4,  tp=24,  tn=354  } 

(al_a2  >)  ==>  far 

•C  pos=  0.59,  neg=  0.00,  cf=  0.000 

p=172,  n=0,  tp=294,  tn=84  > 

(hl_h2  >)  ==>  far 

{  pos=  0.59,  neg=  0.00,  cf=  0.000 

p=172,  n=0,  tp=294,  tn=84  } 

(wl_w2  >)  ==>  far 

{  pos=  0.59,  neg=  0.00,  cf=  0.000 

p=172,  n=0,  tp=294,  tn=84  } 

(nil_ni2  >)  ==>  far 

{  pos=  0.59,  neg=  0.00,  cf=  0.000 

p=172,  n=0,  tp=294,  tn=84  } 


Figure  33  Set  of  Rules  Based  on  Stacked  Placement 

The  logical  connection  feature  also  showed  up  in  one  of  the  rules  for  near.  How¬ 
ever,  it  was  not  as  we  expected.  This  rule  indicated  that  eight  of  the  gate  pairs  with 
the  concept  of  near  are  logically  connected  with  the  first  gate  providing  an  input  to 
the  second  gate  [log ..conn  =  in).  This  is  true  in  the  “stacked”  layout.  However,  there 


93 


are  another  eight  gate  pairs  that  are  logically  connected  with  the  first  gate  receiving 
an  output  from  the  second  gate  [logjconn  =  out).  This  did  not  show  up  in  the  rules. 
It  turns  out  that  the  learning  system  found  both  rules  and  opted  to  keep  only  one. 
We  verified  this  by  changing  the  admissible  logical  connection  features  values  to  yes 
and  no  instead  of  in,  out,  and  none.  All  16  gate  pairs  then  showed  up  in  a  single 
rule.  This  happened  because  we  effectively  increased  the  generality  of  the  logical 
connection  feature.  The  target  layout  tool  appears  to  treat  connected  gate  pairs  the 
same  no  matter  which  way  they  are  connected.  It  does  not  matter  if  gate  1  provides 
an  input  to  gate  2  or  gate  2  provides  an  input  to  gate  1,  this  gate  pair  is  considered 
connected  and  is  treated  as  such. 

No  rules  showed  up  for  the  concept  of  same.  This  was  not  expected  at  first,  but 
further  examination  of  the  layout  showed  why.  While  it  is  true  that  all  gate  pairs 
with  the  concept  of  same  have  the  same  function  and  area,  it  is  also  true  that  the  gate 
pairs  where  one  gate  is  from  the  3-input  NAND  circuit  and  the  other  gate  is  from 
the  3-input  NOR  circuit  have  the  same  area  features.  This  is  true  for  the  gate  pairs 
from  the  4-input  NAND  and  4-input  NOR  circuits  as  well.  These  gate  pairs  have 
the  concept  of  far  and  produce  negative  examples  for  the  area  features.  Therefore, 
the  negative  threshold  of  a  good  rule  was  exceeded  and  the  learning  system  did  not 
produce  the  rule.  In  addition,  there  is  currently  no  feature  that  indicates  if  the  two 
gates  in  the  pair  have  the  same  function,  so  rules  indicating  this  cannot  be  produced. 

Four  rules  showed  up  for  the  concept  of  far.  Three  dealt  with  relative  area  features 
and  one  dealt  with  the  relative  number  of  internal  inputs.  Each  of  the  rules  dealing 
with  area  indicated  the  proper  attribute  value  (different  areas,  heights,  and  widths) 
and  the  correct  coverage  (172  out  of  294).  However,  we  did  not  expect  to  see  the 
rule  dealing  with  the  relative  number  of  internal  inputs.  We  examined  the  3-input 
NAND  circuit  graph  (Figure  31)  to  see  why  it  was  produced.  Three  of  the  seven  gates 
in  this  circuit  have  two  internal  inputs  and  one  external  input  each  (gl8,  gl9,  and 
g27)  while  the  other  four  (g20,  g21,  g22,  and  g23)  have  only  external  inputs.  This  is 
true  for  each  circuit  and  therefore  each  row.  When  the  three  gates  with  internal  and 
external  inputs  from  each  row  are  paired  up  with  the  four  gates  with  only  external 
inputs  from  the  other  three  rows,  the  number  of  internal  inputs  will  be  different.  This 
produced  the  rule  dealing  with  the  relative  number  of  internal  inputs. 
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Running  the  learning  system  after  including  individual  gate  features  in  the  train¬ 
ing  set  produced  19  more  rules  and  more  coverage.  As  we  expected,  the  same  rela¬ 
tional  feature  rules  as  before  showed  up.  However,  the  majority  of  additional  rules 
dealing  with  individual  gate  features  made  no  sense,  even  with  the  “stacked”  layout. 
Each  of  these  rules  had  low  coverage  and  therefore  did  not  reflect  any  real  patterns 
in  the  layout.  This  suggests  that  when  dealing  with  gate  pairs,  only  features  that 
describe  how  the  two  gates  relate  to  each  other  should  be  used. 

We  have  good  confidence  that  the  results  from  these  “stacked”  experiments  and 
others  indicate  the  learning  system  is  very  capable  of  learning  in  this  domain. 

5.2  Validation  Results 

For  the  validation  experiments,  we  used  a  set  of  simple  microprocessor  components 
as  training  and  test  designs.  These  designs  are  listed  in  Table  10,  showing  each 
design’s  name,  function,  the  number  of  nodes  (primitive  gates)  required,  and  if  the 
design  was  used  as  a  training  design,  test  design,  or  both.  Because  the  training  sets 
were  always  made  up  of  several  training  designs,  we  did  not  run  the  risk  of  overfitting 
the  learning  process  by  training  and  testing  with  the  same  design.  We  only  used  one 
design  at  a  time  to  test  the  model’s  performance.  We  believe  these  designs  represent  a 
varied  cross-section  of  different  types  and  sizes  of  microprocessor  components.  (Later 
in  this  section,  we  perform  a  statistical  analysis  to  show  that  these  designs  represent 
a  sufficient  sample.) 

We  obtained  these  designs  by  using  the  TinkerTool  system.  The  Tinker  Tool 
system  automatically  generates  VHDL  descriptions  of  the  designs,  processes  them, 
and  eventually  produces  layouts  using  the  Artistll  layout  tool.  TinkerTool  also  runs 
Magic  to  obtain  the  height  and  width  of  the  layout  and  IRSIM  to  obtain  the 
critical  path  delay.  We  used  these  as  the  actual  area  and  delay  for  error  calculations. 
Each  test  design  has  its  own  actual  height,  width,  and  delay. 

Before  we  describe  the  experiments  and  their  results,  we  need  to  identify  how 
we  measure  the  performance  of  the  estimation  method  using  machine  learning  to 
model  layout  tools.  There  are  two  main  sources  of  error  in  this  approach.  One  source 
of  error  comes  from  applying  the  layout  tool  model  to  target  designs.  This  error 
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Table  10  Set  of  Training/Test  Designs 


Design 

Name 

Function 

Number 
of  Gates 

Number  of 
Transistors 

Type 

add4 

4-b  adder 

43 

146 

Training/Test 

adds 

S-b  adder 

95 

330 

Training/Test 

add  12 

12-b  adder 

147 

514 

Training/Test 

addl6 

16-b  adder 

199 

69S 

Training/Test 

add24 

24- b  adder 

303 

1066 

Test 

add32 

32-b  adder 

407 

1434 

Training/Test 

absl6 

16-b  absolute  value 

20S 

63S 

Test 

inc4 

4-b  incrementer 

23 

6S 

Test 

incS 

S-b  incrementer 

51 

156 

Test 

incl6 

16-b  incrementer 

107 

332 

Test 

sub4 

4-b  subtracter 

4S 

160 

Test 

subs 

S-b  subtracter 

104 

352 

Test 

sub  16 

16-b  subtracter 

216 

736 

Test 

addsub4 

4-b  adder/sub 

64 

224 

Test 

addsubS 

S-b  adder/sub 

132 

464 

Test 

addsubl6 

16-b  adder/sub 

26S 

944 

Test 

mult  2 

2-b  multiplier 

40 

112 

Test 

mult4 

4-b  multiplier 

163 

550 

Test 

mult  6 

6-b  multiplier 

3S9 

137S 

Test 

multS 

S-b  multiplier 

711 

2574 

Test 

mult  16 

16-b  multiplier 

2959 

1103S 

Test 

is  the  aggregation  of  certainty  factors,  the  order  of  relative  placement  concepts  in 
the  gate  pair  list,  and  how  well  the  learning  system  learned  from  the  training  set. 
A  figure  of  merit  to  evaluate  this  error  compares  area  and  delay  estimates  using 
the  estimator  on  layout  information  from  an  actual  layout  and  using  the  estimator 
on  layout  information  from  applying  the  model.  This  figure  of  merit  is  called  the 
modeling  error. 

The  other  source  of  error  is  due  to  the  estimator  itself.  This  estimator  uses 
models  of  wire  area  and  wire  delay  to  derive  area  and  delay  estimates  from  layout 
information.  The  wire  area  model  uses  statistical  data  to  provide  the  average  wire 
length  between  positions  in  the  layout  and  the  required  area  for  routing  the  wires 
among  the  gates.  The  wire  delay  model  provides  a  worse  case  delay  due  to  the  wires 
between  gates.  A  figure  of  merit  to  evaluate  this  error  compares  the  actual  area  and 
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delay  (from  using  Magic  and  IRSIM  on  an  actual  layout)  to  area  and  delay  estimates 
from  using  the  estimator  on  information  from  an  actual  layout.  This  is  called  the 
estimator  error. 

Another  figure  of  merit  compares  area  and  delay  estimates  from  using  the  esti¬ 
mator  on  layout  information  from  applying  the  model  to  actual  area  and  delay  from 
using  Magic  and  IRSIM  on  an  actual  layout.  This  is  called  the  overall  error.  This  er¬ 
ror  is  a  combination  of  the  modeling  error  and  the  estimator  error  because  we  use  the 
estimator  on  layout  information  produced  by  applying  the  model  to  target  designs. 

The  overall  error  shows  how  well  the  method  works  overall  and  the  modeling  error 
shows  how  well  the  model  produced  by  the  learning  system  performs.  The  estimator 
error  shows  how  well  the  area  and  delay  estimator  performs  on  layout  information. 
Now  that  we  have  defined  the  figures  of  merit  for  measuring  performance,  we  can 
describe  the  experiments. 

The  first  experiment  in  validating  the  method  determines  if  machine  learning  is 
better  than  randomly  assigning  relative  placement  concepts.  For  this  experiment,  we 
used  three  test  designs:  an  8-bit  adder,  a  16-bit  adder,  and  a  Tbit  multiplier.  The 
adders  provided  two  of  the  same  type  of  design  with  different  sizes  and  the  multiplier 
provided  a  different  design  type,  also  with  a  different  size.  Table  11  shows  the  overall 
error  in  delay  estimates  for  machine  learning  and  randomly  assigning  concepts.  For 
the  8-bit  adder,  machine  learning  produced  better  delay  estimates.  For  the  16-bit 
adder,  both  machine  learning  and  randomly  assigning  concepts  had  about  the  same 
overall  error.  This  indicates  that  in  some  cases  randomly  assigning  concepts  may 
come  close  to  classifying  the  correct  concepts.  For  the  Tbit  multiplier,  we  see  that 
randomly  assigning  concepts  produces  a  much  worse  error  than  machine  learning 
does. 

Table  11  Machine  Learning  vs.  Random  Concept  Assignments  for  Delay  Estimates 


Design 

Name 

Number 
of  Gates 

Machine 

Learning 

Random 

Assignments 

adds 

95 

-3.68% 

addl6 

199 

3.43% 

mult4 

163 

-1.41% 

27.20% 
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We  also  looked  at  the  modeling  error  in  area  estimates  for  these  three  designs.  Ta¬ 
ble  12  shows  a  comparison  of  the  modeling  errors  for  height  and  width  from  machine 
learning  and  randomly  assigning  concepts.  In  a  few  of  the  cases  but  not  all,  ran¬ 
domly  assigning  concepts  produced  a  better  estimate.  This  is  to  be  expected  because 
sometimes  a  guess  may  be  closer  than  a  well  thought  out  calculation  using  estimates. 
However,  this  table  shows  that  randomly  assigning  relative  placement  concepts  does 
not  consistently  produce  better  estimates  than  machine  learning. 

Table  12  Machine  Learning  vs.  Random  Concept  Assignments  for  Area  Estimates 


Design 

Name 

Number 
of  Gates 

Machine 

Learning 

(height) 

Machine 

Learning 

(width) 

Random 

Assignments 

(height) 

Random 

Assignments 

(width) 

adds 

95 

5.11% 

-2.56% 

7.96% 

-9.50% 

add  16 

199 

-5.40% 

0.29% 

-3.69% 

9.78% 

mult4 

163 

8.55% 

-0.86% 

3.09% 

2.91% 

The  next  experiment  determines  if  using  one  relative  placement  concept  for  all 
gate  pairs  is  better  than  machine  learning.  Using  one  concept  means  that  we  manually 
assign  the  same  concept  to  each  gate  pair.  For  this  experiment,  we  used  the  concepts 
of  nears,  close-1,  and  far.l.  The  concept  of  nears  indicates  the  two  gates  are  located 
next  to  each  other  on  the  same  row.  The  concept  of  close-1  indicates  the  two  gates 
are  located  two  to  three  positions  apart.  The  concept  of  far- 1  indicates  the  two  gates 
are  located  six  to  seven  positions  apart.  Table  13  shows  a  comparison  of  the  overall 
error  for  delay  estimates  from  machine  learning  and  from  using  one  concept  for  all 
gate  pairs.  This  table  shows  that  in  the  case  of  delay,  machine  learning  produces 
better  estimates.  For  example,  the  d-bit  multiplier  has  an  overall  error  of  -1.41% 
for  delay  when  machine  learning  is  used.  It  has  an  overall  error  of  -30.78%  when  all 
connected  gate  pairs  are  assigned  the  concept  of  nears,  an  overall  error  of  -23.72% 
when  all  gate  pairs  are  assigned  the  concept  of  close-1,  and  an  overall  error  of -9.60% 
when  all  gate  pairs  are  assigned  the  concept  of  far-1.  Each  of  these  errors  is  much 
worse  than  the  overall  error  from  assigning  the  concepts  through  machine  learning. 

Table  14  shows  the  results  from  this  experiment  using  height  estimates.  The 
numbers  indicate  the  modeling  errors  for  height  estimates.  In  the  case  of  the  8-bit 
adder,  machine  learning  produced  better  height  estimates.  However,  for  the  other 
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Table  13  Machine  Learning  vs.  One  Concept  for  Delay  Estimates 


Design 

Name 

Number 
of  Gates 

Machine 

Learning 

One  Concept 
(near_s) 

One  Concept 
(close_l) 

One  Concept 
(far_l) 

adds 

95 

-3.68% 

-38.61% 

-31.72% 

-17.93% 

add  16 

199 

3.43% 

-52.80% 

-47.62% 

-37.28% 

mult4 

163 

-1.41% 

-30.78% 

-23.72% 

two  designs,  machine  learning  did  not  produce  better  estimates.  At  first  glance,  this 
may  indicate  that  using  one  concept  is  better  than  machine  learning.  However,  this 
table  only  shows  the  modeling  errors  for  height  estimates.  When  we  consider  the 
modeling  errors  for  width  estimates,  the  story  changes.  Table  15  shows  the  results 
for  width  estimates.  In  most  of  the  cases  here,  machine  learning  produced  much 
better  estimates.  Because  height  and  width  are  two  inseparable  components  of  area, 
we  must  consider  them  together.  Table  16  shows  the  modeling  errors  for  total  area 
estimates  based  on  machine  learning  and  on  using  only  one  concept.  This  table 
shows  that  in  most  cases,  machine  learning  provides  better  total  area  estimates  than 
assigning  one  concept  to  all  gate  pairs. 

Table  14  Machine  Learning  vs.  One  Concept  for  Height  Estimates 


Design 

Name 

Number 
of  Gates 

Machine 

Learning 

One  Concept 
(near_s) 

One  Concept 
(close_l) 

One  Concept 
(far_l) 

adds 

95 

7.96% 

7.96% 

10.81% 

add  16 

199 

-2.46% 

-1.23% 

0.00% 

mult4 

163 

5.82% 

3.09% 

3.09% 

Table  15  Machine  Learning  vs.  One  Concept  for  Width  Estimates 


Design 

Name 

Number 
of  Gates 

Machine 

Learning 

One  Concept 
(near_s) 

One  Concept 
(close_l) 

One  Concept 
(far_l) 

adds 

95 

-2.56% 

-0.49% 

-2.68% 

-11.69% 

0.29% 

2.25% 

-4.21% 

-5.38% 

mult4 

-0.86% 

-5.36% 

-1.26% 

-2.71% 

Another  issue  brought  out  in  this  research  is  if  the  modeling  method  can  be 
used  to  model  the  target  layout  tool  using  different  configurations  and/or  execution 
parameters.  For  example,  we  can  change  the  number  of  iterations  for  the  Artistll 
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Table  16  Machine  Learning  vs.  One  Concept  for  Total  Area  Estimates 


Design 

Name 

Number 
of  Gates 

Machine 

Learning 

One  Concept 
(near_s) 

One  Concept 
(close_l) 

One  Concept 
(far_l) 

adds 

95 

2.42% 

7.43% 

5.07% 

-2.15% 

add  16 

199 

-5.12% 

-0.27% 

-5.39% 

-5.38% 

mult4 

163 

2.62% 

-4.50% 

-2.93% 

-4.36% 

layout  tool.  A  higher  number  of  iterations  produces  layouts  with  less  area  because 
the  cost  function  is  based  on  area.  This  issue  was  raised  when  we  tried  to  run  Artistll 
on  the  16-bit  multiplier  design  using  10,000  iterations.  Artistll  ran  for  over  two  weeks 
with  no  results  before  the  host  machine  had  to  be  rebooted.  The  required  length  of 
time  to  produce  a  layout  for  this  design  using  10,000  iterations  was  too  long,  so  we 
had  to  reduce  the  number  of  iterations.  We  reduced  the  number  to  100  and  got  a 
layout  in  28,746  CPU  seconds,  which  is  close  to  eight  hours.  However,  we  realized 
that  using  a  model  based  on  10,000  iterations  to  produce  area  and  delay  estimates 
and  then  comparing  these  estimates  against  actual  values  from  a  real  layout  based  on 
only  100  iterations  would  be  an  invalid  comparison.  Therefore,  we  needed  to  build 
another  training  set  based  on  training  design  layouts  using  only  100  iterations  of 
Artistll. 

We  used  the  same  five  training  designs  as  before  (4-bit  adder,  8-bit  adder,  12- 
bit  adder,  16-bit  adder,  and  32-bit  adder)  but  produced  layouts  for  each  one  using 
Artistll  running  only  100  iterations.  The  learning  system  processed  this  training  set 
and  produced  a  set  of  rules  in  144.4  CPU  seconds.  The  results  of  this  experiment 
are  shown  in  Table  17.  The  actual  values  in  the  second  column  come  from  an  actual 
Artistll  layout  using  only  100  iterations.  The  third  column  shows  the  estimated 
values  from  applying  our  10,000  iteration  based  model  in  place  of  Artistll.  The  fourth 
column  shows  the  overall  errors  between  the  10,000  iteration  model  and  using  Artistll 
at  100  iterations.  The  fifth  column  shows  the  estimated  values  from  applying  our  100 
iteration  model.  The  sixth  and  last  column  shows  the  overall  errors  between  the  100 
iteration  model  and  using  Artistll  at  100  iteratons. 

Table  17  shows  that  the  estimates  from  the  model  using  100  iterations  are  better 
than  the  estimates  from  the  model  using  10,000  iterations.  The  width  estimates  are 
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Table  17  Comparison  of  Using  Different  Models  on  16-Bit  Multiplier 


Physical 

Dimension 

Actual 

Value 

Estimated 

Value 

(10,000) 

Overall 

Error 

Estimated 

Value 

(100) 

Overall 

Error 

delay 

994.8 

1065.32 

7.09% 

1041.48 

4.69% 

height 

18755 

14864 

-20.75% 

15643 

-16.59% 

width 

9921 

9970 

0.49% 

9970 

0.49% 

area 

186068355 

148194080 

-20.35% 

155960710 

-16.18% 

the  same  because  the  widest  row  in  the  estimated  layout  for  both  models  happened 
to  have  the  same  width.  The  height  estimate  using  the  10,000  iteration  model  is  un¬ 
derestimated  because  the  model  is  based  on  10,000  iterations  of  Artistll  which  should 
produce  a  smaller  layout.  This  also  causes  the  total  area  to  be  underestimated.  This 
experiment  shows  that  it  is  possible  to  create  different  models  for  different  configu¬ 
rations  and/or  execution  parameters  for  the  target  layout  tool.  This  experiment  also 
shows  that  the  different  models  do  reflect  different  configurations  of  the  target  layout 
tool. 

The  next  set  of  results  addresses  the  issue  of  using  the  layout  tool  model  on  a 
wide  variety  of  design  sizes.  We  wanted  to  determine  how  well  the  layout  tool  model 
worked  and  how  well  the  overall  estimation  method  worked.  Eight  graphs  are  used  to 
show  the  results  from  applying  the  layout  tool  model  to  several  test  designs.  These 
graphs  show  percentage  error  vs.  the  wide  range  of  design  sizes  found  in  the  set  of 
test  designs.  (In  each  of  these  graphs,  the  horizontal  axis  is  nonlinear  due  to  the  wide 
range  of  varying  design  sizes.)  To  obtain  these  results,  we  used  five  training  designs 
and  built  two  training  sets.  We  used  a  d-bit  adder,  an  8-bit  adder,  a  12-bit  adder,  a 
16-bit  adder,  and  a  32-bit  adder.  One  training  set  was  built  using  relative  placement 
concepts  from  Artistll  layouts  using  10,000  iterations.  (This  is  the  same  training  set 
as  described  in  Section  4.1).  The  other  training  set  was  built  using  relative  placement 
concepts  from  Artistll  layouts  using  only  100  iterations.  We  needed  the  100  iteration 
model  because  it  took  too  long  using  10,000  iterations  to  produce  actual  layouts  for 
the  large  designs.  We  had  to  run  Artistll  with  only  100  iterations  to  get  the  actual 
layouts  to  compare  against.  Therefore,  in  order  to  make  a  valid  comparison,  we  had 
to  produce  a  model  of  Artistll  using  only  100  iterations. 
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We  processed  each  of  these  training  sets  through  the  learning  system  and  produced 
a  set  of  rules  for  each  one.  These  two  sets  of  rules  became  our  two  models.  (It  took 
149.8  CPU  seconds  to  produce  the  10,000  iteration  model  and  144.4  CPU  seconds  to 
produce  the  100  iteration  model.  A  more  in-depth  comparison  of  these  two  models 
is  provided  later.)  We  applied  the  10,000  iteration  model  to  each  of  the  smaller  test 
designs  (those  to  the  left  of  the  vertical  line  on  the  graphs)  to  obtain  predicted  layout 
information.  We  used  our  estimator  and  this  predicted  layout  information  to  obtain 
height,  width,  area,  and  delay  estimates  for  each  test  design.  Likewise,  we  applied 
the  100  iteration  model  to  the  larger  designs  (those  to  the  right  of  the  line)  in  the 
same  manner.  It  should  be  noted  that  Artistll  requires  a  larger  number  of  iterations 
for  larger  designs  in  order  to  produce  better  layouts.  Using  a  reduced  number  of 
iterations  may  produce  layouts  in  a  reasonable  period  of  time  (CPU  hours  vs.  CPU 
weeks),  but  the  layouts  are  not  very  good  in  terms  of  area.  Therefore,  even  though 
it  provides  a  valid  comparison  with  Artistll  using  100  iterations,  the  100  iteration 
model  will  not  produce  good  results. 

Because  each  test  design  has  its  own  actual  area  and  delay,  we  are  able  to  deter¬ 
mine  the  modeling  error  and  the  overall  error.  The  little  “x”  symbols  in  each  graph 
indicate  the  percentage  error  vs.  the  number  of  nodes  in  the  design.  For  eample. 
Figure  34  shows  a  graph  of  the  modeling  error  for  height  estimates.  (It  should  be 
pointed  out  again  that  the  modeling  error  indicates  how  well  the  model  performs. 
It  does  not  include  any  errors  due  to  the  estimator.)  This  graph  shows  that  the 
modeling  error  is  well  within  10%  for  most  of  the  design  sizes.  This  graph  also  shows 
that  some  of  the  smaller  designs  are  outside  the  10%  limit.  This  is  because  small 
differences  in  small  numbers  create  bigger  percentage  errors. 

Figure  35  shows  a  graph  of  the  modeling  error  for  width  estimates.  This  graph 
shows  that  the  modeling  error  here  is  also  well  within  10%  for  all  of  the  designs. 
Figure  36  shows  the  modeling  error  for  total  area  estimates.  This  graph  shows  that 
most  of  the  designs  have  a  modeling  error  within  10%.  The  other  designs’  modeling 
errors  are  outside  this  range  because  the  individual  modeling  errors  for  height  and 
width  did  not  offset  each  other  and  added  up  in  one  direction.  However,  these  errors 
are  still  well  within  20%.  Figure  37  shows  a  graph  of  the  modeling  error  for  delay 
estimates.  This  graph  shows  that  the  modeling  error  for  delay  is  within  10%  for 
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of  nodes 


Figure  34  Modeling  Error  for  Height  Estimates 


designs  up  to  3000  gates,  but  then  it  starts  to  drop  off  to  around  -15%.  This  is  due 
to  larger  designs  having  more  gates  on  the  critical  path  and  the  errors  due  to  the 
wire  delay  model  start  to  add  up.  However,  these  errors  are  still  within  30%. 
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Figure  35  Modeling  Error  for  Width  Estimates 

These  last  four  graphs  show  that  the  model  performs  quite  well  for  a  wide  range 
of  design  sizes.  The  next  four  graphs  look  at  how  the  overall  method  performs  for 
these  designs.  (It  should  be  pointed  out  again  that  the  overall  error  includes  the 
modeling  error  and  the  estimator  error.) 

Figure  38  shows  a  graph  of  the  overall  error  for  height  estimates.  This  graph 
shows  the  overall  error  for  most  designs  falls  within  10%.  However,  in  a  few  of  the 
small  to  medium  designs,  the  height  is  overestimated.  This  indicates  these  designs 
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Figure  36  Modeling  Error  for  Area  Estimates 
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Figure  37  Modeling  Error  for  Delay  Estimates 

have  more  horizontal  wires  routed  over  the  gates  so  the  wire  area  model  overestimates 
the  horizontal  wire  area.  This  graph  also  shows  that  the  height  is  underestimated  for 
large  designs.  In  these  designs,  there  are  more  horizontal  wires  between  gates  that 
require  more  wire  area. 

Figure  39  shows  a  graph  of  the  overall  error  for  width  estimates.  This  graph  also 
shows  that  most  of  the  designs  are  within  10%,  with  only  a  few  designs  outside  this 
range.  This  indicates  these  designs  have  vertical  wires  routed  over  gates  so  the  wire 
area  model  overestimates  the  vertical  wire  area.  However,  all  of  the  width  estimates 
are  still  within  30%.  Figure  40  shows  a  graph  of  the  overall  error  for  total  area 
estimates.  Here  again,  most  of  the  overall  error  is  within  the  10%  range.  However, 
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Figure  38  Overall  Error  for  Height  Estimates 


a  lot  of  designs  have  their  overall  errors  outside  this  range.  This  is  because  the 
individual  height  and  width  errors  did  not  offset  each  other  and  added  up  in  one 
direction.  But,  even  with  this  happening,  all  of  the  errors  are  still  within  our  goal  of 
30%. 
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Figure  39  Overall  Error  for  Width  Estimates 

Figure  41  shows  a  graph  of  the  overall  error  for  delay  estimates.  This  graph  shows 
the  delay  estimates  for  most  designs  are  also  within  10%.  However,  there  are  a  few 
designs  outside  this  range.  The  design  with  close  to  6000  gates  has  an  underestimated 
delay  around  -30%.  If  we  look  back  at  Figure  37  we  can  see  that  this  design  had  a 
underestimated  modeling  error  of  about  -15%.  This  modeling  error  combined  with 
the  error  from  the  wire  delay  model  underestimated  the  delay  even  more. 
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Figure  40  Overall  Error  for  Area  Estimates 
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Figure  41  Overall  Error  for  Delay  Estimates 


At  this  point,  a  comparison  of  the  10,000  iteration  model  and  the  100  iteration 
model  is  warranted.  Table  18  compares  the  10,000  iteration  model  of  Artistll  to 
the  100  iteration  model.  Although  there  are  the  same  number  of  instances  in  each 
model  (1144),  there  is  different  supporting  evidence  for  each  of  the  concepts.  The 
number  of  instances  are  the  same  because  both  models  are  based  on  the  same  set  of 
training  designs.  The  supporting  evidence  is  different  because  the  actual  layouts  for 
each  training  design  are  different. 

For  example,  in  the  10,000  iteration  model,  we  have  123  instances  with  the  concept 
of  close^l  and  9  rules.  In  the  100  iteration  model,  we  have  231  instances  with  the 
concept  of  close^l  and  18  rules.  For  this  concept,  there  is  twice  as  much  support  in 
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Table  18  Comparison  of  Two  Artistll  Models 


Relative 

Placement 

Concept 

Number  of 
Instances 
(10,000  Its.) 

Number 
of  Rules 
(10,000  Its.) 

Number  of 
Instances 
(100  Its.) 

Number 
of  Rules 
(100  Its.) 

near_s 

82 

7 

101 

8 

near_d 

20 

0 

4 

1 

close_l 

123 

9 

231 

18 

close_2 

130 

5 

74 

6 

far_l 

119 

6 

68 

6 

far_2 

85 

0 

82 

0 

way_out 

585 

8 

584 

8 

total 

1144 

35 

1144 

47 

the  100  iteration  model  as  there  is  in  the  10,000  iteration  model.  Table  18  shows 
similar  comparisons  for  the  remaining  concepts. 

As  far  as  the  contents  of  the  rules,  there  are  many  differences.  For  example,  there 
is  a  rule  in  the  10,000  iteration  model  that  indicates  a  concept  of  far.l  for  any  gate 
pair  that  have  an  unequal  number  of  internal  inputs  and  the  same  function.  This 
rule  has  a  certainty  factor  of  0.639.  There  is  a  rule  in  the  100  iteration  model  that 
indicates  a  concept  of  close.  1  for  the  gate  pairs  meeting  the  same  conditions  with  a 
certainty  factor  of  0.900.  This  is  because  there  is  more  evidence  in  the  100  iteration 
model’s  training  set  for  this  relationship.  Our  examination  of  the  10,000  iteration 
model  and  the  100  iteration  model  shows  they  are  indeed  two  different  models. 

Now  that  we  have  determined  how  well  the  model  performs  and  how  well  the 
overall  estimation  method  performs,  we  can  answer  a  big  question  raised  in  proposing 
this  research.  This  question  is:  “What  percentage  error  in  area  and  delay  estimation 
is  both  achievable  and  acceptable?”  As  described  in  the  introduction  to  this  thesis, 
a  paper  on  layout  estimation  that  Nourani  and  Papachristou  presented  at  the  30th 
ACM/IEEE  Design  Automation  Conference  in  1993  states  a  difference  of  less 
than  10%  is  considered  very  good,  but  difficult  to  achieve.  In  addition,  the  VLSI  CAD 
community  considers  a  difference  of  20-30%  to  be  both  acceptable  and  achievable 
Therefore,  this  research  adopted  the  goal  of  trying  to  achieve  less  than  30% 
difference  while  attempting  to  approach  10%  as  much  as  possible.  It  is  evident  from 
these  graphs  that  we  achieved  our  goal  of  less  than  30%  difference  with  the  majority 
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of  estimates  falling  within  10%.  This  is  true  for  both  the  layout  tool  model  and  the 
overall  estimation  method. 

Another  key  issue  is  how  well  our  set  of  test  designs  samples  the  population  space 
of  designs.  To  address  this  issue,  we  calculated  a  confidence  interval  A 

confidence  interval  shows  how  well  our  set  of  test  designs  samples  the  population 
space  and  indicates  if  it  is  a  truly  representative  sample.  For  this  analysis,  we  used 
the  number  of  gates  for  the  first  20  designs  in  Table  10.  (We  did  not  include  the  16-bit 
multiplier  because  its  large  size  perturbated  the  results.)  The  confidence  interval  was 
found  using  Equation  5.2,  where  x  is  the  mean,  a  is  the  standard  deviation,  t  is  the 
critical  value,  and  n  is  the  number  of  samples.  The  critical  value  came  from  a  table 
and  is  based  on  the  degrees  of  freedom  and  the  desired  confidence  level.  For  a 
sample  size  of  20,  there  are  19  degrees  of  freedom.  If  we  wish  to  use  a  confidence 
level  of  99%,  the  critical  value  t  is  2.86. 

Confidence  Interval  =  x  ziz  —7=  (5-1) 

y/n 

For  our  set  of  test  designs,  x  =  185.9,  a  =  163.4,  and  n  =  20.  Using  these  values 
in  Equation  5.2  gives  us  a  confidence  interval  of  81.4  to  290.4,  at  a  99%  confidence 
level.  This  indicates  that  we  have  sufficiently  sampled  a  population  of  designs  with  a 
mean  number  of  gates  between  81.4  and  290.4.  In  order  to  sample  a  population  with 
a  higher  mean,  we  would  have  to  use  more  test  designs  with  more  gates.  However, 
we  are  limited  to  test  designs  using  32  bits  or  less  due  to  restrictions  in  our  VHDL 
compiler.  This  means  we  would  have  to  build  several  test  designs  using  combinations 
of  smaller  designs. 

Another  worthwhile  analysis  looks  at  the  densities  of  the  Artistll  layouts  versus 
the  densities  of  our  estimated  layouts.  Layout  density  is  defined  as  the  number  of 
gates  per  unit  area.  To  find  the  density  of  a  given  layout,  we  divide  the  number  of 
gates  by  the  total  area.  For  example,  if  a  design  has  512  gates  and  a  total  area  of 
15000  sq.  microns,  the  density  would  be  0.0341  (512/15000).  For  this  analysis,  we 
examined  both  our  suite  of  test  designs  and  the  benchmark  designs.  (The  benchmark 
designs  are  described  in  Section  5.3.)  For  our  test  designs,  the  average  density  of  the 
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Artistll  layouts  is  187.70E-6,  while  the  average  density  for  our  estimated  layouts  is 
176.44E-6.  The  average  difference  between  the  densities  of  Artistll  layouts  and  our 
estimated  layouts  is  -3.25%.  This  shows  that  our  estimated  layouts  have  densities 
very  close  to  that  of  Artistll  layouts. 

For  the  benchmark  designs,  the  average  Artistll  layout  density  is  30.55E-6.  (This 
indicates  the  benchmark  designs  are  less  dense  than  the  test  designs,  i.e.,  there  are 
fewer  gates  per  unit  area.  This  makes  sense  because  the  benchmark  designs  have 
a  lot  more  gates  and  the  total  area  is  mostly  due  to  interconnection  wiring.)  The 
average  density  for  our  estimated  layouts  is  33.70E-6.  The  average  difference  between 
the  densities  of  the  Artistll  layouts  and  our  estimated  layouts  for  the  benchmarks  is 
7.28%.  This  shows  that  even  for  large  complex  designs,  our  estimated  layouts  have 
very  similar  densities. 

The  final  set  of  results  in  this  section  shows  that  using  the  layout  tool  model 
is  much  faster  than  producing  an  actual  layout.  In  fact,  these  results  show  that 
pre-processing  the  target  design,  applying  the  model,  and  even  deriving  area  and 
delay  estimates  can  be  performed  much  faster  than  just  producing  an  actual  layout. 
Table  19  shows  these  execution  times  for  a  wide  variety  of  test  designs.  The  first 
column  shows  the  test  design.  The  second  column  shows  the  size  of  the  design  by  the 
number  of  gates.  The  next  three  columns  show  the  time  it  takes  to  pre-process  the 
design,  apply  the  model  to  produce  a  list  of  gate  pairs  and  their  predicted  relative 
placement  concepts,  and  to  derive  area  and  delay  estimates.  The  area  and  delay 
estimate  time  includes  producing  an  estimated  layout  and  exhaustively  searching 
each  design  twice  to  find  the  critical  path  for  both  high-low  and  low-high  transitions 
on  the  critical  path’s  output.  The  next  to  last  column  shows  the  total  time  for  using 
the  model  which  is  the  sum  of  the  times  for  pre-processing,  applying  the  model,  and 
deriving  estimates.  The  last  column  shows  the  time  it  takes  for  the  Artistll  layout 
tool  to  just  produce  a  layout.  Layouts  for  all  of  the  test  designs  except  for  the  last 
three  were  produced  by  Artistll  using  10,000  iterations.  Due  to  their  large  size,  the 
last  three  designs  in  the  table,  multl6^  big,  and  realbig  required  a  reduced  number  of 
iterations  (100)  in  order  for  Artistll  to  produce  a  layout  at  all!  In  order  to  keep  the 
comparisons  valid,  we  built  a  different  training  set  that  models  Artistll  using  only 
100  iterations  and  used  the  resultant  model  to  obtain  these  execution  times. 
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Table  19  Execution  Time  Comparison  (CPU  sec) 


Design 

Number 

Pre- 

Applying 

Estimating 

Total 

Artistll 

Name 

of  Gates 

Processing 

Model 

Area  and  Delay 

Time 

Time 

add4 

43 

0.13 

0.2 

0.16 

0.49 

840 

adds 

95 

0.29 

0.3 

0.37 

0.96 

2278 

add  12 

147 

0.52 

0.4 

0.78 

1.70 

5385 

add  16 

199 

0.96 

0.7 

1.16 

2.82 

8278 

add24 

303 

1.91 

1.5 

2.41 

5.82 

11886 

add32 

407 

3.12 

3.6 

3.99 

10.71 

18735 

absl6 

20S 

0.S5 

0.6 

1.26 

2.71 

6676 

inc4 

23 

0.40 

0.1 

0.11 

0.61 

405 

incS 

51 

O.SO 

0.2 

0.15 

1.15 

980 

incl6 

107 

0.27 

0.3 

0.40 

0.97 

3408 

sub4 

4S 

0.10 

0.2 

0.18 

0.48 

828 

subs 

104 

0.30 

0.4 

0.48 

1.18 

2374 

sub  16 

216 

1.70 

0.6 

1.37 

3.67 

7788 

addsub4 

64 

0.15 

0.4 

0.25 

0.80 

1398 

addsubS 

132 

0.42 

0.4 

0.62 

1.44 

4896 

addsubl6 

26S 

1.45 

1.0 

1.79 

4.24 

10224 

mult  2 

40 

0.90 

0.2 

0.13 

1.23 

653 

mult  4 

163 

0.63 

0.7 

0.89 

2.22 

4106 

mult  6 

3S9 

3.14 

3.5 

3.68 

10.32 

25213 

multS 

711 

10.39 

7.2 

11.44 

29.03 

47696 

mult  16 

2959 

190.47 

56.9 

195.20 

442.57 

28746 

big 

6264 

93S.S4 

217.6 

1059.00 

2215.44 

65493 

realbig 

1252S 

3S52.49 

879.4 

2873.84 

7605.73 

359289 

It  is  evident  from  this  table  that  using  the  layout  tool  model  is  orders  of  magnitude 
faster  than  just  producing  an  actual  layout.  For  example,  the  multS  test  design,  an 
8-bit  multiplier,  required  47,696  CPU  seconds  for  Artistll  to  produce  a  layout.  Using 
the  layout  tool  model  instead  only  requires  29.03  CPU  seconds,  and  this  includes 
deriving  area  and  delay  estimates.  Even  for  large  designs  like  realbig,  using  the  layout 
model  is  much  faster  than  running  Artistll.  This  is  especially  important  when  you 
consider  that  we  only  used  100  iterations  of  Artistll  for  the  large  designs.  Overall, 
combining  these  execution  time  results  with  the  area  and  delay  estimate  results  shows 
that  this  method  of  modeling  layout  tools  does  produce  accurate  estimates  very 
quickly. 
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5.3  Benchmark  Results 


This  section  describes  the  results  from  applying  our  model  to  benchmark  designs. 
(These  benchmarks  were  obtained  through  MCNC,  Center  for  Microelectronic  Sys¬ 
tems  Technologies,  in  North  Carolina  Because  these  benchmark  designs  each 

contain  more  than  2000  gates,  we  had  to  use  a  reduced  number  of  iterations  (100) 
of  Artistll  just  to  get  a  layout.  Therefore,  we  used  our  model  of  Artistll  at  100 
iterations  to  derive  the  area  and  delay  estimates.  Table  20  shows  the  actual  height 
and  width  from  the  Artistll  layouts  and  compares  these  to  the  height  and  width 
estimates  from  using  our  model.  These  results  show  that  we  are  within  our  goals  for 
height  and  width  estimates  on  the  benchmark  designs. 


Table  20  Height  and  Width  Estimates  for  Benchmark  Designs 


Design 

Name 

Number 
of  Gates 

Actual 

Height 

Estimated 

Height 

Estimated 

Width 

Percent 

Error 

GCD16 

2004 

8666 

9833 

13.47% 

7233 

6084 

-15.89% 

DIV16 

2085 

8858 

7376 

-16.73% 

6316 

6317 

0.02% 

HALS 

3621 

9235 

8052 

-12.81% 

9627 

8191 

-14.92% 

GCD32 

3669 

10747 

11339 

5.51% 

13246 

11532 

-12.94% 

DIV32 

4177 

14299 

16361 

14.42% 

10383 

11384 

9.64% 

HAL16 

8416 

26690 

24680 

-7.53% 

16214 

18368 

13.28% 

Table  21  shows  the  actual  delay  obtained  through  IRSIM  simulations  of  the  bench¬ 
mark  designs  and  the  delay  derived  from  information  produced  by  applying  our  model 
of  Artistll  to  the  benchmark  designs.  It  should  be  noted  here  that  up  until  now,  our 
method  has  only  been  tested  on  purely  combinational  circuits.  These  results  show 
that  our  method  is  applicable  to  and  performs  very  well  on  sequential  circuits  as  well 
as  combinational  circuits. 

Table  22  shows  a  comparison  of  the  execution  times  for  our  method  and  for  using 
Artistll  to  obtain  a  layout.  The  times  for  using  our  method  include  pre-processing  the 
design,  applying  the  model,  and  estimating  area  and  delay.  The  Artistll  execution 
times  are  based  on  100  iterations.  These  results  show  that  our  method  is  orders  of 
magnitude  faster  than  just  producing  a  layout  using  Artistll  with  a  reduced  number 
of  iterations,  even  for  large  complex  designs. 
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Table  21  Delay  Estimates  for  Benchmark  Designs 


Design 

Name 

Number 
of  Gates 

Actual 
Delay  (ns) 

Estimated 
Delay  (ns) 

Percent 

Error 

GCD16 

2004 

158.9 

129.51 

-18.49% 

DIV16 

2085 

208.6 

199.71 

-4.26% 

HALS 

3621 

195.9 

176.68 

-9.81% 

GCD32 

3669 

331.7 

327.78 

-1.18% 

DIV32 

4177 

344.7 

336.15 

-2.48% 

HAL16 

8416 

749.6 

686.31 

-8.44% 

Table  22  Execution  Time  Comparison  for  Benchmark  Designs  (CPU  sec) 


Design 

Name 

Number 
of  Gates 

Pre- 

Processing 

Applying 

Model 

Estimating 
Area  and  Delay 

Total 

Time 

Artistll 

Time 

GGD16 

2004 

36.88 

18.27 

75.63 

130.78 

6513 

DIV16 

2085 

38.16 

11.52 

92.22 

141.90 

5948 

HALS 

3621 

126.41 

51.46 

283.49 

461.36 

13940 

GCD32 

3669 

415.03 

60.09 

282.45 

757.57 

54213 

DIV32 

4177 

507.50 

68.85 

375.45 

951.80 

50293 

HAL16 

8416 

2015.30 

305.58 

1602.84 

3923.72 

217432 

5.4  Modeling  Other  Layout  Tools 


This  section  describes  how  the  modeling  method  can  be  used  to  model  a  different 
layout  tool.  This  time  the  target  layout  tool  is  TimberWolf.  The  Timber  Wolf  layout 
tool  is  described  in  Section  2.2.3.  This  section  also  presents  results  from  using  the 
TimberWolf  model  to  estimate  area  and  delay.  These  results  are  presented  at  the 
end  of  this  section. 

Because  this  is  a  different  layout  tool,  we  had  to  implement  the  method  in  a 
different  way.  The  method  itself  was  not  changed,  just  the  implementation  details. 
TimberWolf  uses  standard  cells  rather  than  primitive  gates,  so  we  modified  the  im¬ 
plementation  to  handle  standard  cells.  The  node  library  that  provides  information 
on  the  primitive  gates  was  replaced  with  a  node  library  that  provides  the  same  in¬ 
formation  for  standard  cells.  As  described  in  Section  3. 3. 2.1,  the  modeling  method 
requires  a  physical  characteristics  library  that  provides  information  such  as  a  cell’s 
total  area,  height,  width,  intrinsic  delay  (for  both  high-low  and  low-high  transitions). 
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the  number  of  inputs,  and  the  cell’s  function. 

Another  modification  was  changing  the  parsing  routine  in  the  Match  tool.  Instead 
of  parsing  a  flattened  file  description,  the  parsing  routine  has  to  parse  a  cell-based 
netlist.  This  netlist  provides  the  same  type  of  interconnection  and  topology  informa¬ 
tion  that  the  flattened  file  description  provides.  The  parser  also  had  to  be  modified 
to  parse  the  layout  description  file  from  TimberWolf.  This  file  provides  the  same 
type  of  relative  placement  information  that  the  .grp  file  does  for  Artistll. 

The  wire  delay  model  and  wire  area  model  also  require  modification.  A  standard 
cell  layout  is  different  from  a  layout  using  primitive  gates,  so  these  models  must 
reflect  this.  The  wire  delay  model  is  still  a  function  of  the  length  of  wire  connected 
to  the  output  and  the  number  of  loads  on  the  output,  but  now  it  must  be  based  on 
standard  cells.  Because  we  are  using  standard  cells,  the  wire  area  model  can  be  based 
on  the  required  channel  width.  This  means  a  wire  area  model  such  as  Rent’s  rule 
or  Left  Edge  Algorithm  can  be  used  with  good  results. 

Once  these  modifications  were  made,  the  modeling  process  is  the  same.  We  build 
training  sets  using  the  same  training  designs,  but  with  training  instances  based  on 
cell  pairs  (instead  of  gate  pairs)  and  relative  placement  information  from  layouts 
produced  by  TimberWolf.  The  learning  process  is  the  same  as  well.  The  learning 
system  trains  on  the  training  set  and  produces  a  set  of  rules.  The  set  of  rules  becomes 
the  model  of  the  input-to-output  relationships  that  result  from  using  TimberWolf. 
Applying  the  model  to  obtain  layout  information  is  done  in  the  same  way  as  before. 
The  target  designs  are  pre-processed  and  the  rules  are  applied.  A  list  of  cell  pairs 
and  their  predicted  relative  placement  concepts  is  generated.  This  list  provides  the 
necessary  layout  information  so  area  and  delay  can  be  estimated. 

After  making  modifications  to  the  parsing  routine  and  creating  a  new  node  library 
based  on  standard  cells,  we  produced  a  model  of  TimberWolf.  It  only  took  a  couple 
of  hours  to  make  the  modifications  to  the  parsing  routine  and  producing  the  model 
of  TimberWolf  only  took  1008  CPU  seconds.  The  training  set  is  based  on  five  adder 
designs:  a  4-bit,  8-bit,  12-bit,  16-bit,  and  32-bit  adder.  The  training  instances  are 
based  on  cell  pairs  that  are  logically  connected.  Relative  placement  concepts  came 
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from  layouts  produced  by  TimberWolf.  We  also  used  the  same  reduced  Partial  Do¬ 
main  Model  as  we  did  in  Section  4.1.  This  Partial  Domain  Model  is  copied  here  in 
Figure  42  for  easy  reference. 

near_s ,  near 
com_dest : 
com_src : 
al_a2: 
hl_h2: 
wl_w2: 
ntl_nt2: 
nxl_nx2; 
nil_ni2: 
funl_fun2: 
name: 

Figure  42  Partial  Domain  Model  for  TimberWolf 

Using  this  training  set,  the  learning  system  produced  29  rules.  Figure  43  shows 
six  of  the  29  rules,  with  one  rule  for  each  concept.  No  rules  were  produced  for  the 
concept  of  far-2.  This  is  because  there  was  no  supporting  evidence  in  the  training 
set  for  this  concept.  We  compared  these  rules  to  those  we  obtained  for  the  Artistll 
model  and  found  some  differences. 

Table  23  compares  the  models  for  Artistll  and  TimberWolf.  First  of  all,  the 
distribution  of  concepts  is  different  for  the  two  models.  The  logically  connected  cells 
in  the  TimberWolf  layouts  tend  to  be  placed  closer  together  than  the  gates  in  the 
Artistll  layouts.  The  TimberWolf  training  set  has  237  instances  of  cell  pairs  being 
placed  next  to  each  other  on  the  same  row  (concept  of  nears).  This  is  close  to  half 
of  the  total  number  of  training  instances  (620).  This  occurs  because  the  TimberWolf 
layout  tool  fine  tunes  the  placement  in  its  third  and  last  stage  of  execution  so  less 
wire  area  is  required.  (The  three  stages  of  the  TimberWolf  layout  tool  are  explained 
in  Section  2.2.3.) 

The  Artistll  training  set  has  only  82  instances  of  gate  pairs  having  the  concept  of 
nears.  This  is  only  about  7%  of  the  total  number  of  training  instances  (1144).  On 
the  other  end  of  the  relative  placement  spectrum,  the  Artistll  training  set  has  585 


d,  close.l,  close_2,  far_l,  far_2,  way_out.  |  concepts 
yes,  no. 
yes ,  no . 


ignore 
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(com.dest  no)  (funl_fun2  !=)  ==>  near_s 

{  pos=  0.78,  neg=  0.49,  cf=  0.616 
p=186,  n=186,  tp=237,  tn=383  } 

(nxl_nx2  ==)  (nil_ni2  ==)  ==>  near.d 

{  pos=  0.27,  neg=  0.08,  cf=  0.702 
p=6,  n=50,  tp=22,  tn=598  } 

(nxl_nx2  !=)  (nil_ni2  ==)  ==>  close_l 

{  pos=  0.26,  neg=  0.13,  cf=  0.646 
p=35,  n=65,  tp=137,  tn=483  } 

(funl_fun2  ==)  ==>  close_2 

{  pos=  0.18,  neg=  0.06,  cf=  0.734 
p=18,  n=30,  tp=101,  tn=519  }■ 

(com_dest  yes)  ==>  far_l 
{  pos=  0.50,  neg=  0.30,  cf=  0.611 
p=29,  n=171,  tp=58,  tn=562  } 

(com.dest  yes)  (ntl_nt2  !=)  ==>  way_out 

{  pos=  0.49,  neg=  0.30,  cf=  0.604 
p=22,  n=174,  tp=45,  tn=575  } 


Figure  43  Example  Set  of  Rules  for  Timber  Wolf 

instances  of  gate  pairs  being  located  10  or  more  positions  apart  (concept  of  way-out). 
This  is  a  little  over  half  of  the  total  number  of  training  instances.  The  TimberWolf 
training  set  has  only  45  instances  for  the  concept  of  way-out.  This  is  about  7%  of 
the  total  number. 

The  distribution  of  rules  among  the  concepts  for  both  models  is  about  the  same. 
This  shows  there  is  about  an  equal  amount  of  supporting  evidence  for  each  of  the 
concepts  in  both  models.  (Except  for  the  concepts  of  neared  and  far^2  in  the  Artis- 
tll  model  and  the  concept  of  far-2  in  the  TimberWolf  model  which  shows  there  is 
no  supporting  evidence  for  these  concepts.)  A  roughly  even  amount  of  supporting 
evidence  indicates  the  set  of  training  designs  adequately  represents  a  good  sample 
of  each  concept.  This  also  indicates  that  the  concepts  chosen  and  the  scope  of  the 
concepts  sufficiently  handles  the  necessary  relative  placement  information. 
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Table  23  Comparison  of  Artistll  and  Timber  Wolf  Models 


Relative 

Placement 

Concept 

Number  of 
Instances 
(Artistll) 

Number 
of  Rules 
(Artistll) 

Number  of 
Instances 
(TimberWolf) 

Number 
of  Rules 
(TimberWolf) 

near_s 

82 

7 

237 

5 

near_d 

20 

0 

22 

5 

close_l 

123 

9 

137 

5 

close_2 

130 

5 

101 

5 

far_l 

119 

6 

58 

1 

far_2 

85 

0 

20 

0 

way_out 

585 

8 

45 

8 

total 

1144 

35 

620 

29 

While  some  of  the  rules  are  the  same  for  both  models,  the  majority  of  the  rules 
are  different.  Having  some  of  the  rules  the  same  in  both  models  is  expected  because 
we  are  using  the  same  set  of  training  designs  and  the  same  Partial  Domain  Model 
(design  features)  for  both  models.  It  is  also  expected  that  more  rules  are  different 
because  these  two  target  layout  tools  operate  at  different  levels  and  their  layout  styles 
are  different.  For  example,  there  is  a  rule  in  both  models  that  indicates  a  concept 
of  nears  for  any  two  logically  connected  gates  or  cells  that  do  not  share  a  common 
destination  and  have  different  functions.  The  certainty  factor  for  this  rule  in  the 
Artistll  model  is  0.603  and  the  certainty  factor  in  the  Timberwolf  model  is  0.616. 
These  certainty  factors  are  very  close  and  indicates  there  is  close  to  an  equal  amount 
of  evidence  in  both  models  for  this  rule. 

We  can  make  other  comparisons  showing  differences  in  the  two  models.  For 
example,  there  are  no  rules  in  the  TimberWolf  model  indicating  concepts  for  cells  of 
different  heights  or  with  the  same  heights.  This  is  because  all  of  the  standard  cells 
in  the  training  designs  have  the  same  height,  so  no  relationships  exist.  On  the  other 
hand,  there  are  rules  in  the  Artistll  model  that  consider  the  height  features. 

Another  comparison  shows  there  is  a  rule  in  the  Artistll  model  that  indicates  a 
concept  of  waysut  for  gate  pairs  that  have  the  same  width.  This  rule  has  a  certainty 
factor  of  0.670.  In  the  Timber  Wolf  model,  there  are  three  rules  for  cells  with  the  same 
width,  but  they  indicate  three  different  concepts:  close-2  with  a  certainty  factor  of 
0.728,  way-Out  with  a  certainty  factor  of  0.687,  and  neav-d  with  a  certainty  factor  of 
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0.669.  This  indicates  the  Artistll  training  set  only  supports  a  concept  of  way.out  for 
gates  that  have  the  same  width.  On  the  other  hand,  the  TimberWolf  training  set 
supports  three  different  concepts  for  cells  with  the  same  width.  This  is  a  significant 
difference  between  the  two  models. 

One  more  comparison  shows  another  difference  between  the  two  models.  In  the 
TimberWolf  model  there  is  a  rule  that  indicates  a  relative  placement  concept  of  close-2 
for  any  cell  pairs  with  the  same  function.  This  rule  has  a  certainty  factor  of  0.734.  In 
the  Artistll  model,  there  is  a  rule  that  indicates  a  concept  of  way-out  for  gate  pairs 
with  the  same  function.  This  rule  has  a  certainty  factor  of  0.643.  The  same  feature 
indicates  two  different  concepts  in  the  two  models.  The  concepts  and  certainty  factors 
are  different  because  the  information  in  the  training  sets  is  different.  This  along  with 
other  comparisons  confirms  that  the  learning  system  did  learn  different  information 
from  the  two  training  sets. 

Now  that  we  have  compared  the  TimberWolf  model  to  the  Artistll  model,  we  can 
show  how  well  the  TimberWolf  model  works  on  test  designs.  The  test  designs  on 
which  we  used  the  TimberWolf  model  include  a  4-bit,  8-bit,  16-bit,  and  32-bit  adder 
and  a  4-bit,  8-bit,  and  16-bit  multiplier.  Because  they  are  test  designs,  each  one  has 
an  actual  layout  from  TimberWolf  so  we  can  make  valid  comparisons  and  derive  the 
appropriate  errors. 

We  applied  the  TimberWolf  model  to  each  of  these  test  designs  and  produced  a  list 
of  cell  pairs  and  predicted  relative  placement  concepts  for  each  one.  Then  we  used  the 
estimator  to  get  a  preliminary  set  of  estimates.  Table  24  shows  the  area  estimates 
using  the  TimberWolf  model.  These  height  and  width  estimates  are  based  on  an 
estimated  layout  using  the  same  “growing”  technique  as  described  in  Section  4.5. 
The  height  and  width  of  the  individual  cells  comes  from  the  cell  library.  However, 
we  used  the  same  wire  area  models  as  described  in  Section  4.5.  It  is  expected  that 
better  wire  models  will  produce  better  area  estimates.  The  goal  for  modeling  the 
TimberWolf  layout  tool  was  to  see  if  the  method  could  be  implemented  to  handle 
another  layout  tool.  These  results  indicate  that  we  are  in  the  ball  park  for  overall 
error  and  it  is  only  a  matter  of  tweaking  the  area  models  to  get  better  estimates. 
For  example,  as  the  designs  get  larger  the  height  estimates  have  more  negative  error. 
This  indicates  the  current  wire  area  model  does  not  calculate  the  required  height  of 
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the  channels  correctly  and  underestimates  it  more  as  the  number  of  cells  increases. 
The  width  is  mostly  overestimated  because  the  current  model  calls  for  more  vertical 
wire  area  as  a  function  of  the  number  of  rows  and  cells.  In  the  case  of  standard  cells, 
existing  feedthroughs  can  be  used  to  route  the  vertical  wires  and  not  as  much  extra 
wire  area  is  needed. 


Table  24  Height  and  Width  Estimates  from  TimberWolf  Model 


Design 

Name 

Number 
of  Cells 

Actual 

Height 

Estimated 

Height 

Overall 

Error 

Actual 

Width 

Estimated 

Width 

Overall 

Error 

add4 

19 

251 

274 

9.16% 

368 

367 

-0.27% 

adds 

51 

507 

538 

6.11% 

480 

516 

7.50% 

add  16 

115 

687 

670 

-2.47% 

760 

830 

9.21% 

add32 

243 

1259 

1066 

-15.33% 

1016 

1090 

7.28% 

mult4 

79 

671 

670 

-0.15% 

536 

594 

10.82% 

mult  8 

392 

1531 

1330 

-13.13% 

1248 

1450 

16.19% 

mult  16 

1681 

4011 

2914 

-27.35% 

2464 

3226 

30.93% 

To  estimate  delay,  we  used  the  same  wire  delay  models  as  described  in  Section  4.4. 
Table  25  shows  the  delay  estimates  using  the  TimberWolf  model.  These  delay  esti¬ 
mates  are  based  on  the  intrinsic  delays  from  the  cell  library  and  the  wire  delay  models 
from  before.  These  delay  estimates  are  well  within  a  10%  overall  error  and  indicate 
that  the  wire  delay  models  and  the  intrinsic  delays  from  the  cell  library  are  accurate. 


Table  25  Delay  Estimates  from  TimberWolf  Model 


Design 

Name 

Number 
of  Cells 

Actual 

Delay 

Estimated 

Delay 

Overall 

Error 

add4 

19 

11.2 

12.10 

8.04% 

adds 

51 

26.5 

25.90 

-2.26% 

add  16 

115 

58.0 

56.94 

-1.83% 

add32 

243 

122.8 

119.02 

-3.08% 

mult4 

79 

17.8 

17.84 

0.22% 

mult  8 

392 

43.6 

43.34 

-0.60% 

mult  16 

1681 

102.6 

96.14 

-6.30% 

The  main  goal  here  is  to  show  that  the  modeling  method  is  general  enough  so 
it  can  be  used  to  model  other  layout  tools.  We  have  shown  that  it  is  possible,  with 
only  minor  modifications  to  the  implementation,  to  model  another  layout  tool.  This 
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section  described  what  modifications  were  necessary  to  model  the  relationships  from 
the  Timber  Wolf  layout  tool  and  also  showed  that  we  are  well  within  an  acceptable 
range  (30%  overall  error)  for  the  area  and  well  within  our  target  goal  (10%  overall 
error)  for  delay  estimates.  It  is  anticipated  that  minor  changes  to  the  wire  area  and 
wire  delay  models  in  the  estimator  will  produce  even  better  estimates. 

5.5  Comparison  to  Other  Area  and  Delay  Estimators 

This  section  shows  how  our  area  and  delay  estimation  method  compares  to  other 
estimators.  We  compare  results  from  this  estimation  method  to  results  from  the 
other  estimators  described  in  Chapter  2.  However,  a  couple  of  the  estimators  in 
Chapter  2  do  not  lend  themselves  to  valid  comparisons.  For  example,  the  Fasolt 
system  is  designed  for  estimating  layouts  for  macrocell  based  designs  only. 

A  comparison  between  this  and  our  custom  gate  or  even  standard  cell  layout  models 
would  be  invalid.  In  addition,  Knapp  did  not  compare  this  layout  estimator  to  other 
estimation  systems.  He  only  showed  that  his  layout  estimator  could  improve  initial 
layouts  from  16%  to  37%  through  several  reiterations. 

The  layout  area  and  delay  modeling  system  from  Ramachandran  et.  al. 

also  works  on  macrocells  so  a  comparison  would  be  invalid  here  as  well.  However, 
they  did  report  an  error  rate  of  13%  for  area  estimates  and  10%  for  delay  estimates, 
based  on  designs  averaging  around  2000  cells. 

Another  estimator,  the  ADAM  system  produces  a  lower-bound  area  and 

delay  curve  which  shows  different  delay  values  for  given  area  values,  all  for  the  same 
design.  To  make  a  valid  comparison,  we  would  have  to  run  the  ADAM  system  on 
one  of  our  test  designs  to  produce  this  curve.  Then  we  would  have  to  produce  several 
area  and  delay  estimates  for  the  same  design  using  our  layout  tool  model.  We  can  get 
different  area  and  delay  estimates  by  changing  the  number  of  rows  in  the  estimated 
layout.  However,  it  is  impractical  to  run  the  ADAM  system  on  our  designs.  In 
addition,  Jain  et.  al.  compared  their  lower  bound  area-delay  curves  to  area-delay 
estimates  based  on  layouts  produced  by  other  systems,  but  they  did  not  calculate  a 
quantitative  error.  They  could  only  show  that  their  curve  was  close  and  defined  a 
theoretical  lower  bound  for  area  and  delay  trade-offs. 
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We  can  make  valid  comparisons  with  the  other  layout  estimators.  Table  26  shows 
a  comparison  of  results  from  our  area  and  delay  estimation  method  to  results  from 
estimators  by  Nourani  and  in  the  TELE  2.0  and  PLEST  systems. 

Table  26  Comparison  to  Other  Estimators 


System 

Name 

Design 

Name 

Dim. 

Their 

Error 

Our 

Error 

(Nourani) 

mult4 

ht 

3.09% 

0.77% 

wd 

4.22% 

1.83% 

multS 

ht 

4.69% 

-1.60% 

wd 

6.87% 

-15.93% 

TELE  2.0 

mult  16 

area 

-13.91% 

-4.85% 

del 

18.00% 

-6.30% 

add32 

area 

9.62% 

-9.37% 

del 

6.00% 

-3.08% 

PLEST 

multS 

area 

5.8% 

1.05% 

addl6 

area 

10.6% 

6.51% 

mult  16 

area 

3.1% 

-4.85% 

For  example,  Nourani  and  Papachristou  used  their  layout  estimator  on  a  4-bit 
multiplier  design  containing  521  transistors.  They  produced  a  height  estimate  that 
was  off  by  3.09%  from  the  actual  height.  They  produced  a  width  estimate  that  was 
off  by  4.22%.  Our  4-bit  multiplier  test  design  has  550  transistors.  Our  layout  tool 
model  (based  on  Artistll)  produced  a  height  estimate  that  was  off  by  only  0.77%  and 
a  width  estimate  that  was  off  by  only  1.83%.  They  also  produced  estimates  for  an 
8-bit  multiplier  with  2097  transistors.  Their  height  estimate  was  off  by  4.69%  and 
their  width  estimate  was  off  by  6.87%.  Using  the  Artistll  layout  tool  model  on  our 
8-bit  multiplier  (2574  trans.),  the  height  estimate  was  off  by  -1.60%  while  the  width 
estimate  was  off  by  -15.93%.  This  indicates  that  we  are  slightly  better  than  their 
system  or  at  least  very  competitive.  In  addition,  this  layout  estimator  only  produces 
area  estimates  so  we  could  not  compare  delay  estimates. 

We  can  also  compare  our  method  to  the  TELE  2.0  system  The  TELE 

2.0  system  was  used  on  a  design  containing  1829  cells.  This  is  roughly  equivalent 
to  our  16-bit  multiplier  which  has  1681  cells.  (The  results  presented  by  the  TELE 
2.0  system  are  for  total  area.)  For  this  size  of  design,  their  total  area  estimate  was 
off  by  -13.91%.  Because  the  TELE  2.0  system  uses  standard  cells,  we  used  results 
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from  our  TimberWolf  model  for  comparisons.  The  Timber  Wolf  model  gave  us  a  total 
area  estimate  for  the  16-bit  multiplier  that  was  off  by  -4.85%.  The  TELE  2.0  system 
produced  a  delay  estimate  that  was  off  by  18.00%  for  this  design.  Our  delay  estimate 
was  off  by  only  -6.30%. 

Another  example  comparing  our  layout  tool  modeling  method  to  the  TELE  2.0 
system  is  an  adder  design.  Their  adder  design  contains  225  cells.  Our  equivalent 
32-bit  adder  design  has  243  cells.  Their  total  area  estimate  is  off  by  9.62%.  Our  total 
area  estimate  (using  the  TimberWolf  model)  is  off  by  -9.37%.  Their  delay  estimate 
for  this  design  is  off  by  6.00%.  Our  delay  estimate  is  off  by  -3.08%.  This  comparison 
and  the  one  before  indicate  we  are  very  competitive  with  the  TELE  2.0  system.  This 
is  even  using  the  TimberWolf  model  which  is  not  optimized  as  much  as  it  could  be. 

The  PLEST  system  is  another  layout  area  estimator  that  we  can  compare 
our  method  against.  (This  system  does  not  estimate  delay.)  One  of  their  designs  is  an 
8-bit  multiplier.  They  report  a  total  area  estimate  that  is  off  by  5.8%.  Because  this 
system  also  uses  standard  cells,  we  used  our  model  of  TimberWolf.  The  TimberWolf 
model  produced  a  total  area  estimate  that  was  off  by  only  1.05%.  Another  design 
they  reported  results  on  is  a  16-bit  adder.  They  reported  a  area  estimate  that  is  off 
by  10.6%.  The  TimberWolf  model  produced  an  area  estimate  for  our  16-bit  adder 
that  was  only  off  by  6.51%.  They  also  reported  results  on  a  16-bit  multiplier.  Their 
area  estimate  was  off  by  3.1%.  For  our  16-bit  multiplier,  the  area  estimate  was  off 
by  -4.85%. 

These  comparisons  show  that  our  modeling  method  is  very  competitive  in  area 
and  delay  estimation.  We  have  also  shown  that  both  models,  the  Artistll  model  and 
the  TimberWolf  model,  can  estimate  area  and  delay  as  well  if  not  better  than  some 
of  the  current  estimators. 


6.6  Summary 

This  chapter  on  experiments  addressed  several  key  issues  in  this  research.  Through 
experimentation  with  a  number  of  training  designs  and  test  designs,  and  two  target 
layout  tools,  we  have  demonstrated  that  our  modeling  method  is  valid  and  quickly 
produces  accurate  area  and  delay  estimates.  For  example,  we  have  shown  that  the 
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machine  learning  method  produces  better  estimates  than  just  randomly  assigning 
relative  placement  concepts  or  assigning  one  concept  to  all  gate  pairs.  We  have  also 
shown  that  it  is  possible  to  produce  different  models  of  the  same  target  layout  tool 
using  different  connfigurations  and/or  execution  parameters  for  the  layout  tool.  By 
looking  at  the  modeling  error  and  the  overall  error,  we  have  shown  that  we  can  suc¬ 
cessfully  apply  the  model  to  a  wide  range  of  design  sizes.  We  also  demonstrated  that 
we  achieved  our  goal  of  less  than  30%  error  with  most  of  the  estimates  within  10%. 

We  compared  the  execution  times  for  pre-processing  the  target  design,  applying 
the  model,  and  estimating  area  and  delay  to  the  time  it  takes  for  the  target  layout 
tool  to  just  produce  a  layout.  The  modeling  method  is  orders  of  magnitude  faster  and 
includes  estimating  area  and  critical  path  delay.  Our  method  is  faster  even  for  large 
complex  designs.  We  also  showed  good  results  from  applying  our  model  to  benchmark 
designs.  Here  we  showed  that  our  method  can  also  handle  large  complex  sequential 
circuits  as  well  as  combinational  circuits.  Another  set  of  experiments  showed  that  it 
is  possible  to  model  the  input-to-output  relationships  for  other  layout  tools  using  the 
machine  learning  method  with  only  minor  modifications  to  the  implementation.  In 
addition,  we  showed  that  the  resultant  models  for  the  two  layout  tools  we  modeled 
were  quite  different.  Finally,  we  compared  results  from  both  of  the  models  to  other 
area  and  delay  estimators  and  showed  that  we  are  very  competitive. 
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6.0  CONCLUSIONS 


This  chapter  begins  by  summarizing  this  thesis.  It  does  this  by  recapping  the 
motivation  for  the  research,  the  goals  of  the  research,  the  solution,  and  the  accom¬ 
plishments.  It  also  recaps  how  well  the  solution  met  the  goals.  Next,  this  chapter 
discusses  the  important  contributions  from  this  research,  what  things  could  have  been 
done  differently  (hindsight),  and  the  evolution  of  possible  future  research. 

6.1  Thesis  Summary 

The  research  described  in  this  thesis  shows  it  is  possible  to  use  machine  learning 
techniques  to  quickly  derive  accurate  area  and  delay  estimates.  Accurate  estimates 
enhance  the  ability  of  high-level  synthesis  systems  to  produce  optimal  designs.  How¬ 
ever,  the  most  accurate  estimates  are  based  on  information  from  actual  design  layouts. 
These  layouts  require  a  long  time  to  produce,  so  the  high-level  synthesis  system  must 
either  use  inaccurate  estimates  or  spend  a  lot  of  time  waiting  for  a  layout.  Both 
ways  are  ineffective.  Therefore,  the  motivation  for  this  research  was  to  find  a  way  to 
quickly  derive  accurate  area  and  delay  estimates  so  the  overall  effectiveness  of  high- 
level  synthesis  systems  could  be  enhanced.  We  did  this  by  using  machine  learning 
techniques  to  model  the  input-to-output  relationships  that  result  from  using  layout 
tools,  rather  than  modeling  the  layout  itself.  We  have  shown  that  using  the  model 
in  place  of  the  layout  tool  quickly  produces  accurate  area  and  delay  estimates. 

In  Chapter  2  we  looked  at  several  layout  estimators  and  discussed  their  individ¬ 
ual  strengths  and  weaknesses.  We  also  pointed  out  that  each  one  concentrated  on 
modeling  the  layout  itself,  and  not  the  layout  tool.  Then  Chapter  2  described  the 
theory  behind  layout  tools  that  use  a  simulated  annealing  approach.  We  described 
the  simulated  annealing  algorithm  and  looked  at  two  layout  tools  using  this  algo¬ 
rithm,  Artistll  and  TimberWolf.  We  described  these  two  layout  tools  because  we 
intended  to  model  each  one  using  the  method  from  this  research. 
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Chapter  2  finished  by  describing  the  theory  behind  machine  learning  and  dis¬ 
cussing  three  common  implementations:  linear  discriminants,  neural  nets,  and  pro¬ 
duction  rule  based  systems.  We  showed  how  linear  discriminants  and  neural  nets 
did  not  satisfy  our  need  to  directly  check  the  rules  that  described  the  relationships 
between  the  inputs  and  the  predicted  outputs.  Only  a  production  rule  based  system, 
such  as  Rule  Learner,  produced  human  readable  rules  so  we  could  evaluate  them 
during  the  research.  We  also  described  the  Rule  Learner  system  in  detail  and  showed 
that  it  was  very  applicable  for  our  research. 

In  Chapter  3,  we  described  how  we  came  up  with  the  idea  of  using  machine 
learning  to  model  layout  tools.  We  discussed  some  alternative  approaches  such  as 
using  “brain-damaged”  versions  of  existing  layout  tools  or  using  an  extensive  library 
of  designs  and  corresponding  layouts.  We  showed  how  each  of  these  did  not  meet 
our  goal  of  quickly  deriving  accurate  estimates.  This  chapter  then  described  how 
performing  an  input-to-output  analysis  of  the  target  layout  tool  could  lead  to  a 
model  of  the  tool.  Using  this  model  in  place  of  the  actual  tool  would  quickly  provide 
accurate  area  and  delay  estimates.  Because  this  approach  required  a  data  analysis 
tool,  we  looked  at  different  ones  and  decided  on  using  a  machine  learning  system. 

Chapter  3  continued  by  describing  the  approach  using  machine  learning  to  model 
the  input-to-output  relationships  that  result  from  using  layout  tools.  We  introduced 
a  modeling  method  algorithm  and  a  solution  architecture.  We  described  the  two 
main  parts  of  this  algorithm:  producing  the  model  and  applying  the  model.  For  each 
part,  we  described  several  procedures  from  building  a  training  set  to  deriving  area 
and  delay  estimates. 

Chapter  4  expanded  on  this  algorithm  and  solution  architecture  by  describing  in 
detail  an  actual  implementation  of  this  method.  We  used  Artistll  as  the  target  layout 
tool  and  showed  how  to  build  a  training  set  and  analyze  the  training  set  to  produce 
a  layout  tool  model.  We  also  showed  how  we  can  apply  this  model  to  test  designs 
to  obtain  layout  information  and  how  this  information  was  used  to  give  us  accurate 
area  and  delay  estimates. 

In  Chapter  5,  we  described  several  experiments  that  addressed  key  research  issues. 
First,  we  showed  that  it  was  possible  for  a  machine  learning  system  to  learn  in 


124 


the  VLSI  CAD  domain.  We  then  showed  the  machine  learning  method  was  better 
than  randomly  assigning  relative  placement  concepts  or  using  one  relative  placement 
concept  for  all  gate  pairs.  We  also  showed  it  was  possible  to  model  a  target  layout 
tool  using  different  configurations  and/or  execution  parameters  for  the  tool. 

Another  key  issue  was  determining  if  the  layout  tool  model  could  handle  a  wide 
variety  of  design  sizes.  We  showed  that  it  could  by  applying  the  model  to  several 
designs  and  demonstrating  that  both  the  modeling  error  and  overall  error  were  well 
within  our  goals.  We  showed  the  area  and  delay  estimates  for  most  of  these  designs 
had  errors  within  10%  and  none  had  errors  worse  than  30%.  We  also  showed  that 
using  our  model  was  much  faster  than  producing  actual  layouts.  In  fact,  we  demon¬ 
strated  that  our  method  could  process  designs  and  derive  area  and  delay  estimates 
much  quicker  than  a  layout  tool  could  just  produce  a  single  layout,  even  for  large 
complex  designs. 

We  also  applied  the  model  to  benchmark  designs  to  show  that  it  could  handle 
real  world  designs,  including  sequential  as  well  as  combinational  designs.  The  area 
and  delay  estimates  here  were  also  well  within  our  goals. 

Chapter  5  went  on  to  describe  the  generality  of  the  modeling  method  by  using  it 
to  model  another  layout  tool,  TimberWolf.  We  not  only  showed  that  it  was  possible 
to  model  the  input-to-output  relationships  for  another  tool,  we  used  this  model  to 
produce  accurate  estimates  well  within  the  goals. 

Chapter  5  finished  by  comparing  our  method  to  other  area  and  delay  estimators. 
We  showed  that  both  models  (Artistll  and  TimberWolf)  are  very  competitive  with 
current  estimators. 


6.2  Contributions 

The  contributions  from  this  research  are: 

•  A  method  to  obtain  more  accurate  area  and  delay  estimates  in  a  reasonably 
expedient  way  by  modeling  the  input-to-output  relationships  that  result  from 
using  layout  tools.  Graph  application  principles  are  used  to  obtain  accurate 
area  and  delay  estimates  from  these  layout  tool  models. 
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•  A  generalized  and  novel  method  of  characterizing  layout  tools  by  using  an  artifi¬ 
cial  intelligence  based  machine  learning  system.  This  method  has  demonstrated 
the  ability  to  work  across  different  design  architectures  and  on  different  layout 
tools. 

•  A  new  application  domain  for  machine  learning  systems.  This  new  domain 
introduces  gray  areas,  i.e.,  black  and  white  classifications  are  not  necessary.  A 
classification  that  is  “close  enough”  is  sufficient  for  estimation  purposes. 

•  A  set  of  generalized  features  that  sufficiently  describe  the  target  design  so  rela¬ 
tionships  between  these  features  and  layout  concepts  can  be  used  in  estimating 
area  and  delay. 

The  first  contribution  aids  the  design  automation  process  by  decreasing  the  overall 
design  time  while  increasing  the  effectiveness  of  high-level  decision  making  ability.  It 
does  this  by  modeling  the  input-to-output  relationships  that  result  from  using  the 
layout  tool,  and  not  by  analyzing  the  design  itself.  This  model  is  used  in  place 
of  the  layout  tool  to  provide  the  necessary  layout  information  based  on  the  design. 
This  layout  information  along  with  graph  application  principles  is  used  to  derive 
accurate  area  and  delay  estimates  in  a  timely  fashion.  In  addition,  this  method  is 
applicable  to  a  wide  range  of  design  sizes  (up  to  several  thousand  gates)  and  styles 
(both  combinational  and  sequential).  In  fact,  we  have  shown  results  within  our  goals 
for  design  sizes  up  to  12,500  gates. 

The  second  contribution  is  a  novel  way  of  analyzing  and  modeling  layout  tools. 
It  is  a  general  method  using  a  machine  learning  system  as  a  data  analysis  tool  and 
the  resultant  rules  as  a  representation  of  the  input-to-output  relationships  that  result 
from  using  the  layout  tool.  Due  to  its  generality,  this  method  is  applicable  across 
different  design  architectures  and  may  be  used  to  model  different  layout  tools. 

The  third  contribution  shows  that  it  is  possible  to  use  a  machine  learning  system 
as  a  data  analysis  and  modeling  tool  in  a  computer-aided  design  environment.  This 
aids  design  automation  efforts  by  exhibiting  a  novel  way  to  analyze  and  model  layout 
tools  and  aids  artificial  intelligence  research  by  opening  a  new  application  domain  for 
machine  learning.  It  also  shows  that  exact  classifications  are  not  necessary  in  domains 
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where  reasonable  estimates  are  acceptable.  The  first  set  of  experiments  dealing  with 
learning  demonstrates  that  a  machine  learning  system  can  function  effectively  in  a 
computer-aided  design  environment. 

The  fourth  contribution  is  a  set  of  features  that  sufficiently  describe  the  target 
design.  These  features  are  based  on  graph  applications.  Some  features  describe 
the  connectivity,  degree,  and  physical  attributes  of  each  node,  while  the  remaining 
features  describe  the  topology  of  the  design  in  relation  to  each  node,  and  relative 
physical  attributes  of  logically  connected  node  pairs.  These  features  along  with  layout 
concepts  are  important  in  describing  the  input-to-output  relationships  that  result 
from  using  layout  tools.  In  addition,  these  features  can  be  generalized  when  necessary 
so  less  information  is  needed  to  sufficiently  describe  the  design. 

6.3  Hindsight  and  Evolution 

This  section  suggests  future  research  for  the  area  and  delay  estimation  method  by 
looking  back  on  what  could  have  been  done  differently  and  looking  forward  to  new 
directions. 

In  the  area  of  learning,  one  different  tactic  is  to  use  neural  nets  instead  of  a 
production  rule  system.  As  described  in  Section  2.3.3,  we  used  a  production  rule 
learning  system  so  we  could  evaluate  the  rules  directly.  In  some  cases,  it  was  also 
necessary  to  compare  different  sets  of  rules  to  see  how  they  were  different  or  similar. 
However,  after  we  were  confident  in  the  learning  procedure  and  had  a  quality  suite 
of  training  sets,  we  could  use  a  neural  network  to  learn  the  necessary  concepts.  This 
would  speed  up  the  learning  process  and  the  model  application  process.  However,  it 
may  prove  difficult  to  incorporate  physical  limitations  in  the  application  of  a  neural 
net  model.  We  have  no  certainty  factors  to  rank  first  and  second  choice  concepts. 
One  possible  solution  is  to  incorporate  the  physical  limitations  into  the  training  sets. 
This  would  increase  the  complexity  of  the  learning  process,  but  may  yield  better 
model  results. 

Another  different  tactic  is  to  model  Artistll  and  Clusterll  together.  This  means 
using  Clusterll  to  provide  an  initial  clustering  so  Artistll  can  yield  better  layouts. 
This  would  be  helpful  when  dealing  with  large  training  designs.  We  could  get  better 
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layouts  from  which  to  learn.  This  also  gives  chip  designers  more  flexibility  because 
they  can  choose  between  different  models:  some  based  on  Artistll  alone  and  some 
based  on  Artistll  and  Clusterll  together. 

In  the  area  of  estimation,  we  can  do  a  number  of  things  differently.  One  improve¬ 
ment  is  to  check  for  false  paths  in  the  delay  estimator.  (In  this  research,  we  verified 
the  estimated  critical  paths  of  all  the  test  designs  against  the  actual  critical  paths 
found  by  IRSIM.)  It  is  possible  for  a  target  design  to  have  the  estimated  worse  case 
path  turn  out  to  be  unachievable  for  any  input  data  vector.  An  estimator  that  checks 
for  this  would  enhance  a  chip  designer’s  confidence  in  the  estimated  delay. 

Another  estimation  improvement  is  to  use  better  wire  area  models.  In  this  re¬ 
search  we  used  a  very  simple  estimation  for  wire  area.  More  columns  requires  more 
horizontal  wires  and  more  rows  requires  more  vertical  wires.  A  better  method  might 
be  to  use  Rent’s  rule  or  the  Left  Edge  Algorithm  However,  this  would 

require  a  better  placement  technique.  In  this  research,  we  placed  nodes  into  an  es¬ 
timated  layout  so  they  would  satisfy  the  relative  distance  between  node  pairs.  We 
gave  no  consideration  to  interconnections  among  more  than  two  nodes.  Using  Rent’s 
rule  or  the  left  edge  algorithm  produced  overly  large  wire  areas.  Perhaps  learning  the 
concept  of  net  types  would  lead  to  better  placement.  Instead  of  placing  two  nodes 
according  to  their  predicted  relative  distance,  we  would  place  all  nodes  from  a  single 
net  into  the  estimated  layout.  This  would  be  done  one  net  at  a  time.  Then  maybe 
Rent’s  rule  and  the  left  edge  algorithm  would  yield  better  wire  areas. 

In  the  area  of  using  the  model,  one  suggestion  is  to  completely  automate  the 
process.  The  user  would  have  a  set  of  models  to  choose  from  and  would  only  have 
to  enter  a  high-level  description  of  the  target  design  into  the  system.  This  high-level 
description  would  be  automatically  compiled  and  translated  into  the  proper  format. 
The  user  selected  model  would  be  applied  and  area  and  delay  estimates  would  be 
produced.  Automating  the  process  would  significantly  increase  the  speed  and  reduce 
errors  from  human  mistakes.  This  in  turn  would  increase  the  overall  effectiveness  of 
high-level  synthesis  systems  using  our  method. 
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The  following  is  a  list  of  definitions  used  in  this  research: 

Architecture:  refers  to  the  instruction  set,  physical  components  and  timing  to 
which  all  hardware  implementations  must  adhere  The  architecture  of  a  given 

design  transcends  all  three  abstraction  levels:  behavioral,  structural,  and  physical. 

Implementation:  refers  to  specific  hardware  designs  using  a  particular  architecture 

(71) . 

Design  Automation:  refers  to  using  computer-based  routines  to  perform  the  design 
process 

Design  Automation  Tools:  computer-based  software  routines  that  can  handle 
designs  that  are  too  large  or  complex  to  undertake  otherwise.  Also  provides  for 
shorter  design  time,  improved  product  quality  (performance  and  reliability),  and 
reduced  product  costs 

VLSI  CAD  Design  Process:  refers  to  a  sequence  of  transformations  on  design 
representations  at  the  three  levels  of  abstraction  -  behavioral,  structural,  and  physical 

(72) . 

High  Level  Synthesis:  refers  to  the  process  where  an  abstract  behavioral  specifi¬ 
cation  of  a  digital  system  is  transformed  into  a  register  transfer  level  structure  that 
realizes  the  given  behavior 

Behavior:  refers  to  the  way  the  system  or  its  components  interact  with  their 
environment;  the  mapping  from  inputs  to  outputs  ^^b 

Structure:  refers  to  the  set  of  interconnected  components  that  make  up  the  system 

(1). 

Physical  Design:  refers  to  the  process  of  reducing  a  structural  description  of  a 
given  design  down  to  the  geometric  layout  of  an  integrated  circuit  (7^). 

Physical  Layout:  refers  to  a  set  of  planar  geometric  shapes  in  several  layers  that 
represent  the  low  level  components  and  interconnections  of  a  given  design  (^“^b 


130 


Placement:  refers  to  the  problem  of  assigning  locations  to  fixed  blocks  that  repre¬ 
sent  components  on  a  layout  surface  Relative  placement  refers  to  the  placement 
of  two  components  relative  to  each  other. 

Floorplanning:  refers  to  a  placement  problem  where  the  blocks  that  represent 
components  are  not  fixed 

Machine  Learning:  refers  to  using  a  computer  program  to  acquire  knowledge 
May  also  refer  to  using  a  computer  program  as  a  data-analysis  tool  for  scientists  and 
other  investigators 

Inductive  Learning:  refers  to  the  inference  of  a  generalized  conclusion  from  par¬ 
ticular  instances  provided  by  a  teacher  or  environment  (^^b 

Features:  refers  to  a  set  of  attribute-value  pairs  that  describe  the  characteristics 
of  the  domain  to  be  learned  ^^^b 

Training  instance:  refers  to  a  set  of  features  that  describe  one  example  from  the 
domain  to  be  learned  ^^^b  Each  training  instance  has  the  same  set  of  attributes 
describing  the  features,  but  with  different  values  depending  on  the  characteristics  of 
that  instance. 

Training  set:  refers  to  a  set  of  training  instances  ^^^b 

Concept:  refers  to  an  abstract  or  generic  classification  generalized  from  particular 
instances  ^^^b 

Data:  refers  to  words  and  or  numbers  which  have  specific  meaning.  Data  derive 
their  meaning  from  a  precise  name  and  agreed  upon  definition  ^'^^b 

Information:  refers  to  data  that  have  been  organized  and  arranged  to  convey 
knowledge  ^'^^b 

Method:  refers  to  a  means  or  manner  of  procedure,  a  regular  and  systematic  way 
of  accomplishing  something,  an  orderly  and  systematic  arrangement.  Also  refers  to 
a  set  of  procedures  that  follow  a  detailed,  logically  ordered  plan  ('^^b 

Methodology:  refers  to  a  system  of  principles,  practices,  and  procedures  applied 
to  a  specific  branch  of  knowledge  ^^^^b 
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This  technical  report  describes  how  to  use  the  machine  learning  method  of  mod¬ 
eling  the  input-to-output  relationships  that  result  from  using  a  layout  tool.  It  also 
describes  how  to  use  this  model  (a  set  of  rules)  in  place  of  the  layout  tool  to  quickly 
obtain  accurate  area  and  delay  estimates.  This  report  follows  an  algorithm  that  de¬ 
scribes  how  these  relationships  are  modeled  and  how  the  model  is  used  to  produce 
area  and  delay  estimates.  This  report  describes  each  step  of  the  algorithm  and  to 
clarify  the  discussion,  uses  an  example  of  modeling  an  actual  layout  tool  and  using 
the  model  to  estimate  area  and  delay. 

A  machine  learning  tool  called  Rule_Learner  learns  the  input-to-output  relation¬ 
ships  and  produces  the  set  of  rules.  Another  tool  called  Match  (written  exclusively 
for  this  application)  formats  the  training  and  target  designs  and  estimates  area  and 
delay.  In  the  estimator  mode,  the  Match  tool  calls  another  tool,  IE  (Inference  En¬ 
gine),  to  apply  the  rules  to  a  target  design  and  produce  a  list  of  predicted  layout 
concepts.  The  Match  tool  uses  this  list  to  estimate  area  and  delay.  The  target  lay¬ 
out  tool  is  Artistll  from  the  Keystone  Tool  Suite.  The  operations  of  each  of 
these  tools  are  fully  described  in  the  appendix.  The  complete  solution  architecture 
as  implemented  for  modeling  Artistll  using  these  tools  is  shown  in  Figure  44. 

All  of  the  tools  and  example  files  described  by  this  report  are  located  on  the 
machinejaguar.ee.pitt.edu,  under  the  directory  /usr/users/software/est.  The  subdi¬ 
rectory  location  of  the  tools  and  example  files  are  given  in  this  report  as  they  are 
described.  To  access  the  tools  described  in  this  report,  the  user  should  add  the 
following  line  to  their  path  environment: 


set  path  =  ($path  /usr/users/software/est/bin) 
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Delay  Area  Area  &  Delay 

Estimates 

Design  Actual  Learning  Applying 

Pre-processing  Layout  Rules  Rules 

(Common  to  all  paths)  (CPU  Hours)  (Off-line)  (CPU  Seconds) 

Figure  44  Solution  Architecture 

Modeling  Method  Algorithm 

There  are  two  parts  to  the  algorithm:  producing  the  model  and  using  the  model. 
Producing  the  model  uses  the  Rule_Learner  tool  as  a  data  analysis  tool  to  search 
through  a  set  of  training  data  looking  for  trends  and  characteristics  that  determine 
the  relationships  from  using  the  target  layout  tool.  Using  the  model  involves  applying 
the  model  to  a  target  design  to  get  predicted  layout  information  and  then  using  this 
information  to  derive  area  and  delay  estimates. 

Part  One:  Producing  the  Model 

The  first  part  of  the  algorithm  consists  of  two  procedures;  building  a  training  set 
and  analyzing  the  training  set. 
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Procedure  One:  Building  a  Training  Set 

In  this  procedure,  the  user  selects  a  set  of  training  designs  and  processes  them 
through  the  target  layout  tool  into  actual  layouts.  Information  from  the  training 
designs  and  their  corresponding  layouts  is  processed  into  a  training  set.  This  training 
set  represents  the  trends  and  characteristics  that  the  target  layout  tool  exhibits  on 
the  training  designs. 

Step  1.  The  first  step  selects  a  set  of  training  designs.  These  designs  determine 
the  characteristics  of  the  layout  tool  being  modeled.  For  this  report,  the  example 
training  set  is  made  up  of  five  designs  (4-bit  adder,  8-bit  adder,  12-bit  adder,  16-bit 
adder  and  32-bit  adder). 

Step  2.  The  second  step  processes  the  training  designs  into  a  format  the  target 
layout  tool  can  handle.  For  this  application,  this  is  done  by  several  tools  from  the 
high  level  synthesis  system  that  uses  Artistll:  the  Keystone  Tool  Suite. 

Because  Artistll  requires  representation  of  designs  in  the  GLUE  format,  the  train¬ 
ing  designs,  written  in  VHDL,  are  translated  into  the  GLUE  format  by  first  compiling 
them  using  the  tool  called  VCOMP.  VCOMP  compiles  the  designs  into  the  Interme¬ 
diate  VHDL  Format  (IVF).  Another  tool,  IVF2GLUE,  is  used  to  translate  this  IVF 
format  into  the  GLUE  format.  In  addition,  the  formatting  tool.  Match,  requires  a 
flattened  GLUE  file.  The  tool  called  Clusterll  flattens  the  GLUE  file.  For  conve¬ 
nience,  the  example  training  designs  have  already  been  processed  by  these  tools  into 
a  flattened  GLUE  format.  These  files  have  a  .fla  extension  (i.e.,  add4.fla}  and  are 
stored  in  the  directory  /usr/users/software/est/learn/train. 

Step  3.  The  third  step  processes  each  training  design  through  the  target  layout 
tool  into  an  actual  layout.  If  the  target  layout  tool  is  reconfigurable  and/or  has  user 
set  parameters,  these  should  be  set  to  the  desired  values  that  the  model  is  to  reflect. 
A  training  set  and  the  resultant  model  can  only  reflect  one  configuration  of  the  layout 
tool  using  one  set  of  parameter  values. 

For  the  example,  Artistll  processed  each  training  design  into  an  actual  lay¬ 
out  using  10,000  iterations.  The  layout  positions  for  the  nodes  in  the  training 
designs  are  listed  in  the  files  with  a  .grp  extension  (i.e.,  add4.grp)  in  directory 
/  usr  /  users /software /est /learn  /  train. 
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Step  4.  The  fourth  step  involves  processing  the  training  designs  and  their  layout 
information  into  a  training  set  data  format.  The  Match  tool  formats  the  training 
designs  based  on  format  requirements  from  the  Rule_Learner  tool. 

To  run  the  example,  copy  all  of  the  files  from  /usr/users/software/est /learn/ train 
to  a  working  directory.  Then,  to  process  a  training  design  into  the  proper  format, 
use  the  match  tool.  For  example,  to  process  the  4-bit  adder,  issue  the  command: 

match  -t  add4 

The  -t  option  puts  the  Match  tool  into  training  mode  (see  the  man  page  for 
match).  The  Match  tool  parses  the  files  add4.fla  and  add4.grp  to  get  the  relevant 
design  and  corresponding  layout  information.  Match  also  obtains  physical  informa¬ 
tion  about  the  nodes  in  the  training  designs  by  parsing  the  file  called  anodes.phy. 
Match  then  produces  a  training  data  file  called  add4.data  and  a  domain  description 
file  called  add4. names.  In  addition,  Match’s  output  to  the  screen  should  appear  like 
this; 

Filename:  add4 

Begin  processing  flattened  description  file. 

Begin  processing  layout  description  file. 

Layout:  8  rows  by  8  cols.  Number  of  nodes:  43. 

Generate  adjacency  matrix. 

Transform  distance  matrices . 

Produce  data  files. 

Repeat  this  step  for  each  of  the  remaining  training  designs:  addS,  addl2,  addl6, 
and  add32.  When  all  five  training  design  files  have  been  created,  concantenate  them 
into  one  training  set  file.  To  do  this,  issue  the  command: 


cat  add4.data  add8.data  addl2.data  addlS.data  add32.data  >  adder. data 
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This  will  produce  the  complete  training  set  in  the  file  adder,  data.  If  this  file  is  the 
same  as  the  file  example. data  in  the  directory  /usr/users/software/est/learn/train, 
then  the  process  was  properly  executed.  If  they  are  not  the  same,  the  individual 
training  data  files  should  be  deleted  and  this  step  repeated.  In  addition,  a  domain 
description  file  must  be  created.  This  can  be  done  by  copying  any  one  of  the  training 
design  domain  description  files  (i.e.,  add4.names)  to  adder. names. 

Procedure  Two:  Analyzing  the  Training  Set 

The  Rule_Learner  tool  analyzes  the  training  set  for  relationships  between  design 
features  and  layout  concepts.  A  set  of  rules  represents  any  relationships  found.  These 
relationships  indicate  trends  and  characteristics  of  the  target  layout  tool. 

RuleJLearner  searches  through  the  training  set  data  looking  for  evidence  that  sup¬ 
ports  relationships  between  the  design  features  and  the  relative  placement  concepts. 
If  the  confidence  factor  for  a  particular  relationship  meets  or  exceeds  a  user-set  thresh¬ 
old,  a  rule  describing  that  relationship  is  produced.  The  threshold  is  determined  by 
a  parameter  set  by  the  user  before  executing  the  learning  system. 

This  procedure  requires  the  files  adder. data  and  adder. names  produced  by  the 
Match  tool  earlier.  The  training  set  data  is  stored  in  the  file  adder,  data  and  the 
domain  description  data  is  stored  in  the  file  adder. names.  To  learn  the  rules  that 
represent  the  relationships  based  on  these  files,  issue  the  command: 

rule_learner  adder  0.9  0.6  0.1  6  50 

As  described  in  the  man  page  for  ruleJearner,  the  bias  parameters  listed  after  the 
file  name  tell  the  Rule_Learner  tool  to  start  searching  for  rules  with  certainty  factors 
greater  than  or  equal  than  0.9,  and  to  repeat  the  search  for  certainty  factors  greater 
than  or  equal  to  0.8,  0.7,  and  0.6  (decrement  the  threshold  by  0.1  for  each  repetition). 
Therefore,  Rule_Learner  will  perform  a  total  of  four  searches.  The  remaining  bias 
parameters  tell  Rule_Learner  to  allow  no  more  than  6  conjuncts  in  the  left-hand 
side  of  the  rules  generated  and  to  use  a  beam  width  of  50.  The  rules  produced  by 
Rule_Learner  will  be  listed  in  the  file  adder. rules.  This  file  should  be  the  same  as 
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the  file  example.rules  from  the  directory  /usr/users/software/est /learn.  (Due  to  its 
length,  the  screen  output  from  using  Rule_Learner  in  this  example  is  not  shown  here. 
However,  it  is  stored  in  the  file  example. drib  from  /usr/users/software/est /learn.  If 
the  screen  output  from  Rule_Learner  is  redirected  to  a  file,  it  should  look  the  same 
as  example. drib.) 

Part  Two:  Using  the  Model 

The  set  of  rules  produced  by  Rule_Learner  makes  up  the  model  of  the  relationships 
from  using  the  target  layout  tool.  To  apply  the  model,  the  target  design  must  be 
pre-processed  in  the  same  fashion  as  a  training  design.  However,  a  layout  is  not 
produced.  Instead,  the  model  determines  what  layout  concepts  result  from  certain 
features  in  the  design.  This  information  is  provided  to  the  Match  tool  so  it  can  derive 
area  and  delay  estimates. 

Procedure  Three:  Applying  the  Model 

There  are  three  steps  in  this  procedure.  The  Match  tool  pre-processes  the  target 
design,  calls  IE  to  apply  the  model  (set  of  rules),  and  uses  the  resultant  list  of 
predicted  layout  concepts  to  estimate  both  area  and  delay. 

Step  1.  The  first  step  pre-processes  the  target  design  into  the  proper  format.  This 
format  is  identical  to  that  for  a  training  design,  but  because  no  layout  is  produced, 
none  of  the  layout  concepts  are  included.  Pre-processing  the  target  design  into  this 
format  makes  it  straightforward  to  match  features  from  the  target  design  to  those  in 
each  of  the  rules. 

The  example  uses  a  16-bit  adder  as  an  target  design.  The  flattened  GLUE  descrip¬ 
tion  of  this  design  [addld.fla)  is  located  in  directory  /usr/users/software/est/target. 
To  perform  this  procedure,  copy  this  file  to  the  working  directory.  To  pre-process, 
apply  rules  to,  and  estimate  area  and  delay  for  the  16-bit  adder,  issue  the  following 
command: 


match  -sadder  -r8  addl6 
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(As  described  in  the  man  page  for  match,  this  particular  command  puts  Match 
into  the  estimator  mode  so  it  will  pre-process  the  16-bit  adder,  call  IE  to  apply  the 
rules  from  the  file  adder. rules,  and  estimate  area  and  delay  based  on  an  estimated 
layout  using  8  rows.) 

To  pre-process  this  target  design.  The  Match  tool  parses  the  file  addld.fla  and 
produces  a  file  called  addlG.data.  Match  also  produces  a  file  called  addl6. names. 
The  addl6.data  file  contains  the  data  against  which  the  rules  are  applied.  The 
addl6. names  file  contains  the  domain  description.  This  file  provides  a  context  in 
which  the  rules  are  applied. 

Step  2.  In  the  second  step,  the  Match  tool  automatically  calls  the  IE  tool  to 
apply  the  rules.  In  the  example,  the  rules  come  from  the  file  adder.rules.  In  this 
file,  the  rules  are  listed  in  descending  order  according  to  their  certainty  factors.  The 
IE  tool  searches  the  target  design  data  first  using,  the  rule  with  the  highest  certainty 
factor.  The  data  is  searched  again  using  the  rule  with  the  next  highest  certainty 
factor,  and  so  on.  After  all  data  has  been  searched  using  all  rules,  the  IE  tool 
produces  a  file  containing  the  predicted  layout  concepts.  For  the  example,  this  file  is 
called  adder.pred. 

Step  3.  In  the  third  step,  the  Match  tool  uses  the  list  of  predicted  layout  concepts 
from  the  file  adder.pred  to  estimate  area  and  delay.  To  estimate  area,  the  Match  tool 
produces  an  estimated  layout  based  on  the  predicted  layout  concepts.  In  the  example, 
the  Match  tool  produces  an  estimated  layout  using  8  rows  in  the  file  addld.clu.  The 
estimated  layout  file  looks  like  this: 

(test . clu 

(row  rl  nil  nil  gl22  nil  nil  nil  nil  nil  gl94  g70  g95  gl76  gl75  g94  gl84  g99 
g78  gl67  gl66  g47  gl41  g96  gl50  gl62  gl05  g66  g50  gl63  gl08  gl03  g72  gl04) 

(row  r2  gl85  gll2  gl58  gl21  gl57  gl49  gl48  g67  g68  g97  gl40  gl39  g86  gl93  gl06 
gl23  g87  gl86  gl32  gl31  gl36  gl90  gl97  g64  g51  gl54  gl59  gl09  g66  g49  gl72  gl89) 
(row  r3  gll5  gl30  g76  gl44  gl24  gl60  gl33  gl53  g88  gl20  g81  g77  gl42  gll3  g86 
g69  gll8  g54  g48  gl81  gll7  g58  g52  g91  gl45  glOO  glOl  g66  gl98  g92  g63  g74) 

(row  r4  gl38  gl29  gl61  g79  g75  gl71  g83  gl80  g90  g80  gl64  gl56  gl46  gl35  gl26 
gll4  g61  g60  g73  gl27  g59  g82  gl37  gl28  gll9  g57  gllO  g89  gl91  gl82  gl73  g71) 

(row  r6  nil  nil  nil  nil  nil  nil  nil  nil  nil  nil  nil  nil  nil  nil  nil  nil  gl88 
gl79  gl70  gl  gl61  gl52  gl43  g2  gl34  gl25  gll6  g3  gl07  g98  g4  g46) 

(row  r6  nil  nil  nil  nil  nil  nil  nil  nil  nil  nil  nil  nil  nil  nil  nil  nil  g6  g45 
g43  g42  g6  g41  g40  g39  g7  g38  g37  g36  g8  g35  g34  g33) 

(row  r7  nil  nil  nil  nil  nil  nil  nil  nil  nil  nil  nil  nil  nil  nil  nil  nil  g9  g32 
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g31  g30  glO  g29  g28  g27  gll  g26  g25  g24  gl2  g23  g22  g21) 

(row  r8  nil  nil  nil  nil  nil  nil  nil  nil  nil  nil  nil  nil  nil  nil  nil  nil  gl3 

g20  gl9  gl8  gl4  gl7  gl6  gl5  gO  gl96  gl95  gl87  gl78  gl77  gl69  gl68) 

(column  al  rl  r2  r3  r4  r5  r6  r7  r8) 

) 


The  purpose  of  this  estimated  layout  is  to  provide  information  for  area  estimation. 
It  should  not  be  misconstrued  as  a  substitute  for  an  actual  layout  for  any  other 
purpose. 

Next,  the  Match  tool  uses  the  predicted  layout  concepts  to  find  the  wire  delay 
for  all  interconnections  in  the  target  design.  This  information,  along  with  individual 
node  delay  information  from  the  file  anodes.phy,  is  used  by  Match  to  determine  the 
critical  path  and  the  critical  path  delay.  In  the  example,  the  critical  path  including 
delay  information  is  listed  in  the  file  addlS.crit.  (This  file  is  too  large  to  reproduce 
here.  However,  the  format  is  identical  to  output  from  the  IRSIM  simulation  tool.) 

When  the  Match  tool  is  finished,  the  file  addld.clu  should  be  the  same  as  exam¬ 
ple,  clu  and  the  file  addld.crit  should  be  the  same  as  example,  crit.  (These  example 
files  are  located  in  the  directory  /usr/users/software/est/target  for  comparison  pur¬ 
poses.)  In  addition,  the  screen  output  from  Match  should  look  like  this: 

Filencime:  addl6 

Begin  processing  flattened  description  file. 

Layout:  8  rows  by  32  cols.  Number  of  nodes:  199. 

Generate  adjacency  matrix. 

Transform  distance  matrices. 

Produce  data  files. 

Apply  layout  tool  model  (set  of  rules)  : 

Read  256  cases  (11  attributes)  from  adder. test 
Read  31  rules  from  adder. rules 

Find  height  and  width  estimates: 

Estimated  Height  =  944  lambda. 

Estimated  Width  =  1492  lambda. 
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Find  critical  path  delay. 

Predicted  Delay [0]  =  58.92  ns. 

Predicted  Delay [1]  =  90.37  ns. 

Other  Design  Examples 

This  section  gives  the  locations  and  names  of  other  designs  that  were  used  to 
validate  the  machine  learning  method.  The  general  procedures  described  above  may 
be  repeated  for  any  and  all  of  these  designs.  These  designs  may  also  be  used  as  either 
training  designs  or  target  designs. 

These  designs  are  located  in  the  directory  /usr/users/software/est/test.  There 
are  24  test  designs,  the  same  ones  used  in  validating  the  method.  Each  design  is 
represented  by  a  flattened  GLUE  description  (file.fla).  Each  design  also  has  a  corre¬ 
sponding  layout  description  {file. grp)  so  any  design  may  be  used  as  a  training  design 
as  well  as  a  target  design.  Each  of  these  layout  descriptions  (except  for  the  16-bit 
multiplier)  were  produced  by  Artistll  using  10,000  iterations.  The  16-bit  multiplier 
layout  description  {multl6.grp)  was  produced  by  Artistll  using  only  100  iterations. 
For  completeness,  an  alternative  set  of  training  designs  with  layouts  from  Artistll  us¬ 
ing  only  100  iterations  are  stored  in  the  directory  /usr/users/software/est/test /alt. 
These  may  be  built  into  a  training  set  using  the  above  procedures.  The  resultant  set 
of  rules  would  model  the  input-to-output  relationships  that  result  from  using  Artistll 
with  only  100  iterations. 

Summary 

This  technical  report  describes  how  to  use  the  machine  learning  method  of  model¬ 
ing  input-to-output  relationships  that  result  from  using  layout  tools.  It  also  describes 
how  to  use  this  model  to  estimate  area  and  delay.  The  example  given  here  was  based 
on  modeling  the  relationships  from  using  the  layout  tool  Artistll.  It  was  also  based  on 
using  the  machine  learning  tool  Rule_Learner.  Another  tool  called  Match  was  used 
to  format  the  flattened  GLUE  descriptions  of  the  training  and  target  designs  into  the 


141 


appropriate  data  and  domain  description  files,  apply  the  model  (set  of  rules),  and 
estimate  area  and  delay.  The  Match  tool  called  another  tool,  IE,  to  actually  apply 
the  rules  and  produce  a  list  of  predicted  layout  concepts.  The  Match  tool  then  used 
the  information  in  this  file  to  estimate  the  area  and  delay. 

The  example  described  here  was  also  suplemented  by  example  files  for  comparison 
purposes.  The  same  procedure  and  tools  may  be  used  to  build  other  training  sets, 
produce  different  models  of  Artistll,  and  use  these  models  to  estimate  area  and  delay 
for  other  target  designs. 
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APPENDIX  C 


NAME 

artistll  -  a  CMOS,  two  dimensional,  gate  matrix  module  gen¬ 
erator 

SYNOPSIS 

artistll  -In  -rn  -gn  -an  -e  -f  -mcell  -dn  -q  file 
DESCRIPTION 

Artistll  is  a  CAD  tool  which  does  transistor  placement  and 
routing  for  double  layer  metal,  two  dimensional,  CMOS  gate 
matrix  layouts  using  cluster  based  simulated  annealing.  It 
works  best  for  small  and  medium  size  modules  (fewer  than  100 
gates) .  Larger  modules  require  longer  running  times  for 
satisfactory  results.  Artistll  uses  as  its  input  file  a 
GLUE  description  of  the  circuit  (file.glu)  and,  optionally, 
a  cluster  description  (file.clu).  It  outputs  a  Magic  format 
layout  file  (file. mag)  describing  the  smallest  two  dimen¬ 
sional  CMOS  gate  matrix  layout  it  found  and  a  cluster  group¬ 
ing  description  file  (file. grp)  which  lists  the  clusters 
used  in  the  that  layout. 

Arguments  are: 

-In  This  optional  argument  indicates  the  number  of  trial 
placements  to  use.  The  larger  the  number  of  trial 
placements  the  smaller  the  final  layout,  but  the  longer 
the  running  time.  Defaults  to  "1". 
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-rn  This  optional  argument  indicates  the  number  of  well 
sharing  folds  to  make.  Defaults  to  "1". 

-gn  This  optional  argument  indicates  the  number  of  layout 
permutations  to  make  between  evaluations  as  a  percen¬ 
tage  of  the  maximum  number  of  possible  permutations. 
Defaults  to  0%;  i.e.,  only  one  layout  permutation  is 

made  between  evaluations.  A  larger  percentage  will 
speed  up  the  search  process,  but  too  large  a  percentage 
may  degrade  the  layout  results.  Values  around  2%  seem 
to  work  well . 

-an  This  optional  argument  specifies  the  search  strategy  to 
use.  The  values  for  n  range  from  0  (ramp  accept  func¬ 
tion  with  zap  back  strategy)  to  5.  Defaults  to  "0". 
It  is  recommended  that  the  default  be  used  for  the  best 
layout  results. 

-e  This  optional  argument  tells  artistll  to  just  evaluate 
the  layout  without  creating  the  .mag  file  for  the  lay¬ 
out.  This  will  speed  up  the  program  considerably,  but 
cannot  be  used  to  generate  layouts,  just  layout  statis¬ 
tics.  Defaults  to  "off". 

-f  This  optional  argument  tells  artistll  to  produce  a 
flattened  GLUE  description  in  .fla.  This  description 
is  annotated  with  information  about  the  row/column 
placement  of  each  gate.  By  renaming  the  .fla  file  as 
the  .glu  file  and  the  -grp  file  as  the  .clu  file, 
artist  can,  in  one  iteration,  reproduce  the  layout. 
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-mcell 

This  optional  argument  allows  the  designer  to  tell 
artistll  which  cell  to  layout  when  the  input  GLUE  file 
contains  more  than  one  topmost  cell.  Alternatively,  if 
the  module  to  be  laid  out  is  not  the  topmost  cell  it 
can  be  designated  using  this  option. 

-dn  This  optional  argument  turns  on  the  debug  flag.  Vari¬ 
ous  values  of  n  will  give  various  debugging  information 
-  "0"  gives  no  diagnostic  information,  "1"  records  the 
status  of  only  the  "best"  iterations,  "2"  records  the 
status  of  the  other  iterations,  "4"  lists  the  final 
gate  ordering  by  column  for  the  layout  in  a  GLUE  like 
format.  Defaults  to  "0". 

-q  This  optional  argument  causes  artistll  to  run  in  quiet 
mode  so  that  the  GLUE  listing  is  suppressed. 

EXAMPLE 

To  use  artistll  to  produce  the  layout  for  the  carry  circuit 
of  a  binary  full  adder  (stored  in  carry. glu)  issue  the  com¬ 
mand 


artistll  -15  -rl  -d7  carry 

and  artistll  will  generate  the  following  diagnostics  as  well 
as  creating  the  carry. mag  file. 


**  Artistll  -  Version  2.03  - 

Last  updated  7/17/90  at  17:08:55  ** 


Options:  module (carry)  loop(5)  row(l)  debug(0x7) 
accept (ramp) 
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***!i«*!i<**Nt*!i«**=i«***  Module  Description  ♦*>(«*sit***>(!***:(:**st:N<si<** 


*** 

1 

carry (a,  b,  cin,  cout) 

2 

*** 

{ 

!|t** 

3 

*** 

_oai21(a,  b,  cin,  tO) ; 

*** 

4 

_ai2(a,  b,  tl); 

5 

_ai2(t0,  tl,  cout); 

♦  ♦♦ 

6 

*** 

} 

J(t** 

7 

*** 

8 

!(<*=(< 

ai2(i0[]  ,  iin  ,  o[]) 

*** 

9 

{ 

10 

*** 

<i0[]  k  il[]>  o[]  =  0; 

!(<** 

11 

<!i0[]  1  !il[]>  o[]  =  1; 

!(<♦* 

12 

**!(< 

} 

13 

14 

_( 

oai21(iO[],  il[],  i2[],  o[]) 

15 

*** 

{ 

*** 

16 

*** 

<(i0[]  1  il[])  &  i2[]>  o[]  =  0; 

*** 

17 

<(!i0[]  &  !il[])  1  !i2[]>  o[]  =  1; 

St'** 

18 

*** 

} 

Can't  open  carry. clu  -  dumb  mode 

numstm  ->  3  numrow  ->  1  numopr  ->  2  numvgd  ->  4 


num 

size  best 

min 

row 

col 

lev 

acp 

grp 

0 

120  -1 

-1 

10 

12 

0.000 

0.000 

0.000 

best 

-  accepted 

weights  tre  -> 

4  grd 

opr 

->♦♦  T 

1 

120  120 

120 

10 

12 

0.000 

0.000 

0.000 
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same  -  accepted 


weights  tre  -> 

4  grd 

opr 

->**  T 

2  108  120 

120 

9 

12 

0.000 

0.000 

0.000 

best  -  accepted 

weights  tre  -> 

4  grd 

->*♦ 

opr 

->**  T 

3  120  108 

108 

10 

12 

0.000 

0.000 

0.000 

worse  -  rejected 

weights  tre  ->*=*= 

cmp  grd  -> 

3 

opr  -> 

2  G 

4  120  108 

108 

10 

12 

0.000 

0.000 

0.000 

worse  -  rejected 

weights  tre  ->** 

cmp  grd  -> 

3 

opr  -> 

2  G 

5  120  108 

108 

10 

12 

0.000 

0.000 

0.000 

worse  -  rejected 

best  layout  ->  108 

carry_0(cout,  cin,  b,  a) 

{ 

<!a  y,  !b>  1  =  tl; 

<!a  #  !b>  0  =  tl; 

<!tl  y  !t0>  cout  =  1; 

<!tl  #  !t0>  cout  =  0; 

<(!a  #  !b)  I  !cin>  1  =  tO; 

<(!a  %  !b)  &  !cin>  0  =  tO; 

} 

jjt^^Ht************ Hi*  Statistics  H'**!!'********************* 


memory  used  40  kbytes 

input  time  used  0  seconds  user  0  seconds  system 
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compute  time  used  0  seconds  user  0  seconds  system 

The  search  strategy  used  in  the  example  above  is  simulated 
annealing  with  zap  back  using  a  ramp  accept  function.  With 
a  raiap  accept  function,  smaller  layouts  are  accepted  with 
some  increasing  probability,  while  larger  layouts  are 
accepted  with  some  decreasing  probability.  Other  accept 
functions  are  greedy,  cut-off,  and  step.  When  a  zap  back 
occurs  the  "best"  layout  found  so  far  (i.e.,  the  smallest 
layout  measured  in  terms  of  the  number  of  interconnect  rows 
times  the  number  of  interconnect  columns)  is  used  as  the 
starting  layout  for  the  next  annealing  move.  The  next  lay¬ 
out  may  be  "best"  layout  so  far  (which  becomes  the  zap  back 
target),  "better"  than  the  preceding  one  (which  may  be 
rejected  early  in  the  annealing  process,  accepted  later  in 
the  annealing  process,  or  replaced  by  the  zap  back  target), 
the  "same"  size  as  the  preceding  one  (which  may  be  accepted, 
rejected,  or  zapped),  or  may  be  "worse"  than  the  previous 
one  (which  may  be  accepted  early  in  the  annealing  process, 
rejected  later  in  the  annealing  process,  or  zapped).  With 
fewer  than  100  iterations  a  greedy  accept  function  is  actu¬ 
ally  used  (i.e.,  only  best,  better,  and  seune  are  accepted) 
and  with  fewer  than  1000  iterations  no  zap  back  occurs. 
Theoretically,  a  simulated  annealing  algorithm  will  converge 
to  the  global  minimum  if  run  for  a  sufficient  number  of 
trial  placements.  For  this  simple  circuit  artistll  found 
the  best  layout  on  the  second  trial  placement,  one  of  size 
108  requiring  9  layout  rows  and  12  layout  columns . 

In  this  example  artistll  runs  in  "dumb  mode"  since  no  clus¬ 
ter  file  (carry. clu)  exists.  In  dumb  mode,  a  randomly  gen¬ 
erated  cluster  tree  is  used  to  construct  annealing  moves. 
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In  smart  mode,  the  annealing  process  moves  gate  clusters  as 
specified  in  the  .clu  file.  A  .clu  file  can  be  created  with 
clusterll.  The  "weights"  diagnostics  provide  information 
about  the  number  and  type  of  layout  permutations  allowed  at 
each  iteration:  "tre"  is  the  number  of  cluster  permutations, 
"grd"  is  the  number  of  intra-cluster  permutations,  and  "opr" 
is  the  number  of  intergate  (i.e.,  transistors  within  a  gate) 
permutations.  A  "**"  entry  indicates  that  that  type  of  move 
is  not  considered  at  that  iteration.  A  trailing  T,  G,  or  □ 
(tre,  grd,  opr)  indicates  the  actual  permutation  used. 

The  layout  style  used  by  artistll  is  two  dimensional  CMOS 
gate  matrix.  A  two  dimensional  gate  matrix  extends  the  con¬ 
ventional  (one  dimensional)  gate  matrix  layout  to  include 
more  than  one  pull-up/pull-down  row  with  routing  through 
rows  and  routing  through  gates  within  a  row.  The  GLUE-like 
listing  of  the  circuit  description  given  after  the  annealing 
diagnostics  not  only  gives  the  gate  ordering  in  the  final 
gate  matrix  layout  by  interconnect  column,  but  also  provides 
information  about  the  treinsistor  ordering  by  interconnect 
row.  If  a  "#"  ("%")  is  used  in  place  of  a  "&"  ("I")  it 
means  that  that  and  (or)  operator  is  commutative  and  the 
transistors  can  be  placed  in  any  order  in  the  transistor 
vertical  (horizontal)  chain.  Otherwise,  the  gate  ordering 
is  from  left  to  right  working  from  the  Vdd  line  for  pull-ups 
and  the  GND  line  for  pull-downs.  Layout  produced  by  artis¬ 
tll  conform  to  MOSIS  Rev  6  design  rules  which  are  valid  for 
lambda's  of  1.5,  1.0,  0.8,  and  0.6.  Artistll  uses  fixed 
size  transistors  (2  lambda  by  4  lambda  for  nfets  and  2 
lambda  by  6  lambda  for  pfets)  in  the  absence  of  annotation 
information  in  the  GLUE  file,  Vdd  and  GND  lines  of  6  lambda, 
and  two  layer  metal.  Stacked  via's  are  not  supported. 
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Plugs  are  inserted  at  least  one  set  per  gate  pair  at  the 
well  boundaries.  The  plot  which  is  112  lambda  wide  by  96 
lambda  wide  corresponding  to  the  carry. mag  file  is  shown 
below.  Note  that  the  gate  and  transistor  ordering  is  that 
specified  in  the  GLUE  like  output  given  as  part  of  the  the 
diagnostics . 


FILES 

~cad/psutools/bin 
SEE  ALSO 

clusterll(l) ,  ivf2glue(l),  sim2glue(l),  glue(5) 
AUTHORS 

Robert  M.  Owens,  F-S  Fuh,  P-P  Hou,  K.  Pun 


APPENDIX  D 
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APPENDIX  D 


NAME 

rule_learner  -  a  knowledge-based,  production  rule  machine 
learning  system. 

SYNOPSIS 

rule_learner  C  file  <bias-parameters>  ] 

DESCRIPTION 

The  domain  description  (attribute/value/class  information) 
and  training  set  data  are  stored  in  files  called  file. names 
and  file. data,  respectively.  The  format  for  these  files  is 
the  same  as  for  the  C4.5  learning  system  (by  Quinlan). 

<bias-paramenters> : 

start-cf-thresh  stop-cf -thresh  stepsize  MaxConjs  BeamWidth 

Bias  Parameters: 

start-cf-thresh 

starting  threshold  for  minimum  certainty  factor  of 
a  satisfactory  rule 

stop-cf -thresh 

stopping  threshold  for  minimum  certainty  factor  of 
a  satisfactory  rule 


stepsize 
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decrement  value  for  going  from  starting  certainty 
factor  threshold  to  stopping  certainty  factor 
threshold 

MaxCon j  s 

maximum  number  of  conjuncts  allowed  in  rule  anteced¬ 
ent 

BeainWidth 

width  of  beam  in  beam  search 

rule_learner  begins  by  searching  through  the  rule  space 
(defined  by  the  training  set  data  and  domain  description) 
for  rules  with  certainty  factors  greater  than  or  equal  to 
start-cf-thresh.  After  all  rules  have  been  found,  it 
decrements  start-cf-thresh  by  stepsize  and  repeats  the  search. 
rule_learner  implements  a  simple  inductive  strengthening  heuris¬ 
tic:  learn  a  new  rule  only  if  it  covers  an  example  not  covered 
by  any  previously  learned  rule.  The  searches  are  repeated  until 
stop-cf-thresh  is  reached.  The  last  search  uses  stop-cf-thresh 
as  the  minimum  certainty  factor  of  a  satisfactory  rule.  The 
satisfactory  rules  found  are  printed  out  (in  a  human  readable 
format)  at  the  end  of  the  run. 

rule_learner  also  saves  the  rules  learned  (in  a  non-human 
readable  format)  to  file. rules.  This  file  is  used  by  the 
inference  engine  tool  (ie)  when  applying  the  rules. 

EXAMPLE 

This  example  comes  from  a  computer  aided  design  (CAD) 
domain  application  where  rule_learner  is  used  to  model  the 
input -to -output  relationships  that  result  from  using  layout 
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tools.  The  training  set  is  made  up  of  information  from 
training  designs  and  their  corresponding  layouts.  The  layouts 
were  produced  by  the  layout  tool  artistll.  The  training  set 
data  were  produced  by  a  program  called  match.  For  this  example, 
the  training  set  data  is  stored  in  the  file  adder. data  and  the 
domain  description  data  is  stored  in  the  file  adder. names.  To 
learn  the  rules  that  represent  the  relationships  based  on  these 
files,  issue  the  command: 

rule_learner  adder  0.9  0.6  0.1  6  50 

The  bias  parajneters  in  this  excimple  tell  rule_learner  to  start 
searchin  for  rules  with  certainty  factors  greater  than  or  equal 
to  0.9,  and  to  repeat  the  search  for  certainty  factors  greater 
than  or  equal  to  0.8,  0.7,  and  0.6  (decrement  the  threshold  by 
0.1  for  each  repetition).  Therefore,  rule.learner  will  perform 
a  total  of  four  searches.  The  remaining  bias  parameters  tell 
rule_learner  to  allow  no  more  than  6  conjuncts  in  the  left-hand 
side  of  the  rules  generated  and  to  use  a  beeim  width  of  50. 

rule_learner  will  output  the  following  information  to  the  screen 
as  well  as  saving  the  rules  learned  (adder . rules) : 

Read  1144  cases  (11  attributes)  from  adder. data 

Current  Bias:  Search_type:  cf -based  cf  thresh=  0.900 
MaxConjs=  6  BeamWidth=  50 

Ready  to  sort .  Queue  length  is  7 

Exs  to  cover:  1144 
Ready  to  sort.  Queue  length  is  168 
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500 

Exs  to  cover;  1144 

Ready  to  sort.  Queue  length  is  518  1000 
Exs  to  cover:  1144 
Ready  to  sort.  Queue  length  is  330 

Exs  to  cover:  1144 
Ready  to  sort.  Queue  length  is  244 

Exs  to  cover:  1144 

Ready  to  sort.  Queue  length  is  196  1500 
Exs  to  cover:  1144 
Ready  to  sort.  Queue  length  is  144 

Exs  to  cover:  1144 

Found  0  Good  Rules  (before  filtering) 

0  examples  covered  (out  of  1144) 

Found  0  Good  Rules 

POLICY:  0  new  good  rules 

score  of  0  new  good  rules  alone:  0.000000 

score  of  0  old  good  rules  alone:  0.000000 

score  of  0  old  good  rules  in  new_all:  0.000000 

POLICY:  0  rules  before  filtering 

POLICY:  0  rules  after  filtering 

Read  1144  cases  (11  attributes)  from  adder. data 


Exs-to-cover :  1144 
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POLICY:  1144  examples  to  cover 

Current  Bias:  Search.type:  cf-based  cf  thresh=  0.800 
MaxConjs=  6  BeamWidth=  50 

Ready  to  sort .  Queue  length  is  7 

Exs  to  cover:  1144 
Ready  to  sort .  Queue  length  is  168 
2000 

Exs  to  cover:  1144 
Ready  to  sort.  Queue  length  is  518 
2500 

Exs  to  cover:  1144 
Ready  to  sort .  Queue  length  is  330 

Exs  to  cover:  1144 
Ready  to  sort.  Queue  length  is  244 
3000 

Exs  to  cover:  1144 
Ready  to  sort.  Queue  length  is  196 

Exs  to  cover:  1144 
Ready  to  sort .  Queue  length  is  144 

Exs  to  cover:  1144 

Found  0  Good  Rules  (before  filtering) 

0  examples  covered  (out  of  1144) 


Found  0  Good  Rules 


POLICY :  0  new  good  rules 
score  of  0  new  good  rules  alone:  0.000000 
score  of  0  old  good  rules  alone:  0.000000 
score  of  0  old  good  rules  in  new_all:  0.000000 
POLICY:  0  rules  before  filtering 
POLICY:  0  rules  after  filtering 
*** 

Read  1144  cases  (11  attributes)  from  adder. data 

Exs-to-cover :  1144 

POLICY:  1144  examples  to  cover 

Current  Bias: 

Search.type:  cf -based 

cf  thresh=  0.700  MaxConjs=  6  BeamWidth=  50 

Ready  to  sort.  Queue  length  is  7 

Exs  to  cover:  1094 
Ready  to  sort.  Queue  length  is  167 
3500 

beam  before:  516 
beam  after:  351 
Exs  to  cover:  1036 
Ready  to  sort.  Queue  length  is  341 
4000 

beam  before:  280 
beam  after:  166 
Exs  to  cover:  983 

Ready  to  sort.  Queue  length  is  143 


beam  before:  186 
beam  after:  102 
Exs  to  cover:  983 

Ready  to  sort .  Queue  length  is  93 
4500 

beam  before:  154 
beam  after:  77 
Exs  to  cover:  983 

Ready  to  sort .  Queue  length  is  77 

beam  before:  110 
beam  after:  55 
Exs  to  cover:  983 

Ready  to  sort .  Queue  length  is  55 

beam  before:  0 
beam  after:  0 
Exs  to  cover:  983 

Found  43  Good  Rules  (before  filtering) 

161  examples  covered  (out  of  1144) 

Found  15  Good  Rules 

POLICY :  15  new  good  rules 

score  of  15  new  good  rules  alone:  -45363.078125 
score  of  0  old  good  rules  alone:  0.000000 
score  of  0  old  good  rules  in  new_all:  0.000000 
POLICY:  15  rules  before  filtering 
POLICY:  15  rules  after  filtering 


Read  1144  cases  (11  attributes)  from  adder. data 


Exs-to-cover :  1094  1038  1036  1036 

1036  1036  1036  1036  1036  1036 

983  983  983 

POLICY :  983  examples  to  cover 

Current  Bias: 

Search_type:  cf -based 

cf  thresh=  0.600  MaxConjs=  6  BeamWidth=  50 

Ready  to  sort.  Queue  length  is  7 

beam  before:  168 
beam  after:  147 
Exs  to  cover:  694 

Ready  to  sort .  Queue  length  is  139 
5000 

beam  before:  556 
beam  after:  396 
Exs  to  cover:  569 

Ready  to  sort.  Queue  length  is  376 
5500 

beam  before:  344 
beam  after:  214 
Exs  to  cover:  376 

Ready  to  sort.  Queue  length  is  193 


1036 

1017 


beam  before:  104 
beajn  after:  53 
Exs  to  cover:  376 


Ready  to  sort.  Queue  length  is  53 

beam  before:  32 
beam  after:  16 
Exs  to  cover:  376 

Ready  to  sort .  Queue  length  is  16 

beam  before:  0 
beam  after:  0 
Exs  to  cover:  376 

Found  49  Good  Rules  (before  filtering) 

768  examples  covered  (out  of  1144) 

Found  31  Good  Rules 

POLICY:  31  new  good  rules 

score  of  31  new  good  rules  alone:  -10815.149414 
score  of  15  old  good  rules  alone:  -45363.078125 
score  of  15  old  good  rules  in  new_all:  -45363.078125 
POLICY:  46  rules  before  filtering 
POLICY:  31  rules  after  filtering 

Read  1144  cases  (11  attributes)  from  adder. data 


Exs-to- 

-cover: 

1085 

1059 

727 

727 

727 

687 

638 

638 

636 

634 

472 

472 

429 

376 

376 

376 

1059 

1034 

1034 

687 

638 

638 

638 

578 

528 

528 

528 

376 

376 

376 

376 

POLICY:  376  examples  to  cover 
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Rules : 

(ntl_nt2  !=)  (nxl_nx2  !=)  (nil_ni2  !=)  ==>  close_2 

{  pos=  0.46j  neg=  0.28,  cf=  0.622 
p=59,  n=281,  tp=127,  tn=1017  } 

(com_src  no)  (al_a2  !=)  (nil_ni2  !=)  ==>  near_s 

{  pos=  0.32,  neg=  0.17,  cf=  0.636 

p=26,  n=185,  tp=81,  tn=1063  } 

(hl_h2  !=)  (nxl_nx2  !=)  (nil_ni2  !=)  ==>  close_2 

{  pos=  0.46,  neg=  0.28,  cf=  0.622 

p=59,  n=281,  tp=127,  tn=1017  } 

(com_src  no)  (al_a2  !=)  (nil_ni2  !=)  ==>  close_2 

{  pos=  0.29,  neg=  0.17,  cf=  0.622 

p=37,  n=174,  tp=127,  tn=1017  } 

(wl_w2  !=)  (nxl_nx2  !=)  (nil_ni2  !=)  ==>  close_2 

{  pos=  0.46,  neg=  0.28,  cf=  0.622 

p=59,  n=281,  tp=127,  tn=1017  } 

(al_a2  ==)  (nxl_nx2  ==)  ==>  way_out 

{  pos=  0.50,  neg=  0.29,  cf=  0.630 

p=307,  n=152,  tp=619,  tn=525  } 

(com.src  no)  (al_a2  !=)  (nxl_nx2  ==)  ==>  close_2 

{  pos=  0.20,  neg=  0.10,  cf=  0.645 
p=25,  n=104,  tp=127,  tn=1017  } 

(com.dest  no)  (nxl_nx2  !=)  (nil_ni2  !=)  ==>  close_2 
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{  pos=  0.46,  neg=  0.28,  cf=  0.618 

p=59,  n=286,  tp=127,  tn=1017  } 

(coin_src  no)  (hl_h2  !=)  (nil_ni2  !=)  ==>  near_s 

{  pos=  0.81,  neg=  0.51,  cf=  0.613 

p=66,  n=537,  tp=81,  tn=1063  }■ 

(coin_src  no)  (ntl_nt2  !=)  (nil_ni2  !=)  ==>  near_s 

{  pos=  0.81,  neg=  0.51,  cf=  0.613 

p=66,  n=537,  tp=81,  tn=1063  } 

(wl_w2  !=)  (nxl_nx2  ==)  ==>  far_l 

{  pos=  0.47,  neg=  0.28,  cf=  0.621 

p=49,  n=291,  tp=104,  tn=1040  } 

(hl_h2  !=)  (nxl_nx2  ==)  ==>  far_l 

{  pos=  0.47,  neg=  0.28,  cf=  0.621 

p=49,  n=291,  tp=104,  tn=1040  } 

(ntl_nt2  !=)  (nxl_nx2  ==)  ==>  far_l 

{  pos=  0.47,  neg=  0.28,  cf=  0.621 

p=49,  n=291,  tp=104,  tn=1040  } 

(nxl_nx2  ==)  (funl_fun2  !=)  ==>  far_l 

{  pos=  0.47,  neg=  0.28,  cf=  0.621 

p=49,  n=291,  tp=104,  tn=1040  } 

(nxl_nx2  ==)  (nil_ni2  !=)  ==>  far_l 

{  pos=  0.47,  neg=  0.28,  cf=  0.621 

p=49,  n=291,  tp=104,  tn=1040  } 


(nxl_nx2  !=)  ==>  near_s 
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{  pos=  0.68,  neg=  0.40,  cf=  0.624 

p=55,  11=424,  tp=81,  tn=1063  } 

(funl_fun2  ==)  ==>  way_oiit 

{  pos=  0.38,  neg=  0.18,  cf=  0.669 

p=233,  n=97,  tp=619,  tn=525  } 

(nil_ni2  ==)  ==>  way_out 

{  pos=  0.46,  neg=  0.21,  cf=  0.688 

p=287,  n=110,  tp=619,  tii=525  } 

(ntl_nt2  ==)  ==>  way.out 

{  pos=  0.46,  neg=  0.21,  cf=  0.686 

p=283,  n=109,  tp=619,  tn=525  } 

(wl_w2  ==)  ==>  way.out 

{  pos=  0.46,  neg=  0.21,  cf=  0.686 

p=283,  n=109,  tp=619,  tn=525  } 

(hl_h2  ==)  ==>  way.out 

{  pos=  0.46,  neg=  0.21,  cf=  0.686 

p=283,  n=109,  tp=619,  tn=525  } 

(com.src  yes)  ==>  way.out 
{  pos=  0.09,  neg=  0.04,  cf=  0.687 

p=56,  n=21,  tp=619,  tn=525  } 

(com.dest  yes)  ==>  way.out 
{  pos=  0.08,  neg=  0.02,  cf=  0.772 

p=50,  n=12,  tp=619,  tn=525  } 


(com.src  no)  (wl.w2  !=)  (nxl.nx2  ==)  ==>  close.l 
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{  pos=  0.35,  neg=  0.22,  cf=  0.611 

p=43,  n=220,  tp=123,  tn=1021  } 

(com_src  no)  (wl_w2  !=)  (nil_ni2  !=)  ==>  close_l 

{  pos=  0.78,  neg=  0.50,  cf=  0.608 

p=96,  n=507,  tp=123,  tn=1021  } 

(com_src  no)  (lil_h2  !=)  (nxl_nx2  ==)  ==>  close_l 

{  pos=  0.35,  neg=  0.22,  cf=  0.611 

p=43,  n=220,  tp=123,  tn=1021  } 

(coin_src  no)  (hl_h2  !=)  (nil_ni2  !=)  ==>  close_l 

{  pos=  0.78,  neg=  0.50,  cf=  0.608 

p=96,  n=507,  tp=123,  tn=1021  } 

(al_a2  !=)  (nxl_nx2  ==)  ==>  far_l 

{  pos=  0.34,  neg=  0.16,  cf=  0.662 

p=35,  n=171,  tp=104,  tn=1040  } 

(com_src  no)  (ntl_nt2  !=)  (nxl_nx2  ==)  ==>  close_l 

{  pos=  0.35,  neg=  0.22,  cf=  0.611 

p=43,  n=220,  tp=123,  tn=1021  } 

(coin_src  no)  (ntl_nt2  !=)  (nil_ni2  !=)  ==>  close.l 

{  pos=  0.78,  neg=  0.50,  cf=  0.608 

p=96,  n=507,  tp=123,  tn=1021  } 

(com_src  no)  (wl_w2  !=)  (nil_ni2  !=)  ==>  near_s 

{  pos=  0.81,  neg=  0.51,  cf=  0.613 

p=66,  n=537,  tp=81,  tn=1063  } 
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When  rule_learner  is  finished,  the  rules  listed  above 
will  also  be  listed  in  the  file  adder. rules.  This  file 
is  in  a  format  that  the  inference  engine  tool  (ie)  can 
use.  This  list  of  rules  represents  the  input-to-output 
relationships  that  resulted  from  using  artistll  on  a  set 
of  training  designs  including  a  4-bit,  8-bit,  12-bit,  16- 
bit,  and  32-bit  adder.  When  the  tool  ie  applies  these 
rules  to  a  target  design,  the  resultant  output  is  a  list 
of  predicted  layout  concepts  for  that  design.  These  con¬ 
cepts  make  it  possible  to  derive  area  and  delay  estimates 
for  the  target  design  without  producing  an  actual  layout. 

Overall,  once  rule_learner  produces  the  set  of  rules,  the 
tool  called  match  can  pre-processes  the  target  design,  call 
ie  to  apply  the  rules  and  produce  the  predicted  concepts 
list,  and  then  use  this  list  to  estimate  area  and  delay. 

FILES 

/usr/users/sof tware/ est/bin 
/usr/users/ software/ est/ src 
/usr/users/ software/ est/learn 
/usr/users/sof tware/est/learn/train 

SEE  ALSO 

artistll (1),  match(l),  ie(l),  clusterll(l),  glue(5) 

AUTHORS 


Don  S.  Gelosh 
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APPENDIX  E 


NAME 

match  -  a  training  set  builder  and  area  and  delay  estimator 
for  microprocessor  ALU  components  based  on  CMOS  two  dimensional, 
gate  matrix  type  layouts 

SYNOPSIS 

match  [  -t  ]  [  -smodel  -rrows  ]  [  file  ] 

DESCRIPTION 

match  is  a  VLSI  CAD  tool  that  operates  on  microprocessor  ALU 
designs,  match  performs  two  functions.  One  function  is  build¬ 
ing  training  sets  based  on  these  designs  that  train  a  machine 
learning  tool.  This  learning  tool,  called  rule_learner ,  produces 
a  set  of  rules  that  describe  the  input-to-output  relationships 
resulting  from  using  the  layout  tool,  artistll.  The  other  func¬ 
tion  is  estimating  area  and  delay  for  these  designs  using  this 
set  of  rules.  In  the  training  mode,  match  uses  as  its  input  a 
flattened  GLUE  description  of  the  training  design  (file.fla),  a 
layout  description  file  (file. grp)  that  describes  the  actual 
layout  produced  by  artistll  for  the  design,  and  a  node  descrip¬ 
tion  file  (anodes. phy)  that  describes  the  physical  characteris¬ 
tics  of  all  nodes  handled  by  artistll.  match  outputs  a  training 
data  file  (file. data)  and  a  domain  description  file  (f ile .ncimes) 
that  are  used  by  rule_learner  in  producing  the  set  of  rules. 

In  the  estimator  mode,  match  uses  as  its  input  the  set  of  rules 
(model .rules)  produced  by  the  rule_learner  program  and  outputs 
to  the  screen  area  and  delay  estimates,  match  also  produces  a 
file  describing  an  estimated  layout  (file.clu)  and  a  file  des- 


168 


cribing  the  critical  path  (file.crit). 

OPTIONS 

-t  This  argument  puts  the  match  tool  into  training  mode. 

-smodel 

This  argument  puts  the  match  tool  into  rule  application 
and  estimating  mode.  It  also  tells  match  which  model 
(set  of  rules)  to  use. 

-rrows 

This  argument  tells  match  how  many  rows  to  use  in  the 
estimated  layout  (estimator  mode  only.) 

EXAMPLE  (training  mode) 

To  use  match  to  produce  a  training  set  using  a  4-bit  adder 
training  design  (stored  in  addd.fla)  and  its  corresponding 
artistll  layout  description  (add4.grp)  issue  the  command: 

match  -t  add4 

match  will  output  the  following  information  to  the  screen  as 
well  as  create  the  files  add4.data  and  add4. names: 

Filename:  add4 

Begin  processing  flattened  description  file. 

Begin  processing  layout  description  file. 

Layout :  8  rows  by  8  cols .  Number  of  nodes :  43 . 

Generate  adjacency  matrix. 

Transform  distance  matrices. 

Produce  data  files. 
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This  process  must  be  repeated  for  the  remaining  training 
designs.  After  all  training  designs  for  a  particular  train¬ 
ing  set  have  been  processed  by  match,  all  of  the  individual 
training  data  files  (file. data)  can  be  concantenated  into 
one  large  file  which  constitutes  the  training  set.  This  large 
data  file  along  with  any  one  of  the  domain  description  files 
(file. name)  is  then  used  by  rule_learner  to  produce  the  set 
of  rules  that  model  the  input-to-output  relationships  from 
artistll . 

EXAMPLE  (estimator  mode) 

To  use  match  to  apply  the  rules  (stored  in  adder. rules)  and 
estimate  area  and  delay  for  a  16-bit  adder  target  design 
(stored  in  addl6.fla)  with  8  rows  in  the  estimated  layout 
issue  the  command: 

match  -sadder  -r8  addl6 

match  will  output  the  following  information  to  the  screen 
as  well  as  create  the  estimated  layout  (addl6.clu)  and 
critical  path  (addl6.crit)  description  files: 

Filenajne:  addl6 

Begin  processing  flattened  description  file. 

Layout:  8  rows  by  32  cols.  Number  of  nodes:  199. 

Generate  adjacency  matrix. 

Transform  distance  matrices. 

Produce  data  files. 

Apply  layout  tool  model  (set  of  rules) : 

Read  256  cases  (11  attributes)  from  adder. test 

Read  31  rules  from  adder. rules 


170 


Find  height  and  width  estimates: 

Estimated  Height  =  944  lambda. 

Estimated  Width  =  1492  lambda. 

Find  critical  path  delay. 

Predicted  Delay [0]  =  58.92  ns. 

Predicted  Delay [1]  =  90.37  ns. 

The  match  program  first  pre-processes  the  target  design  into  a 
data  file  (addl6 .data) .  Then  it  calls  a  tool  named  ie  (infer¬ 
ence  engine)  which  is  part  of  the  rule_learner  system.  The  ie 
tool  applies  the  rules  (stored  in  adder. rules)  to  the  data 
file.  As  a  result  from  applying  the  rules,  the  ie  tool  pro¬ 
duces  an  intermediate  file  (adder. pred)  which  holds  the  pre¬ 
dicted  layout  information,  match  uses  the  information  in  this 
file  to  build  an  estimated  layout  for  estimating  area  and  to 
calculate  all  of  the  node-to-node  delays  for  estimating  the 
critical  path  delay. 

FILES 

/usr/users/sof tware/est/bin 
/usr/users/software/est/ src 
/usr/users/ sof tware/est/learn 
/usr/users/sof  tware/est/leam/train 
/usr/users/software/est/target 

SEE  ALSO 

artistll(l),  rule_learner(l) ,  ie(l),  clusterll (1) ,  glue(5) 
AUTHORS 


Don  S.  Gelosh 
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APPENDIX  F 


NAME 

ie  -  a  simple  inference  engine  used  to  apply  the  rules  learned 
by  the  machine  learning  tool  called  rule_learner . 

SYNOPSIS 
ie  [  file  ] 

DESCRIPTION 

ie  is  a  simple  voting  inference  engine  which  takes  the  rules 
learned  by  the  rule_learner  tool  (f ile . rules) ,  and  applies 
them  to  the  examples  (file. test).  The  output  is  a  file 
(file.pred)  which  lists  each  example  and  its  predicted  class¬ 
ification. 

EXAMPLE 

For  an  example  that  uses  ie,  see  the  man  page  for  the  match(l) 
tool . 

NOTE 

The  current  version  of  ie  was  modified  to  operate  in  a  computer 
aided  design  (CAD)  domain  application  where  the  machine  learning 
tool  rule_learner  is  first  used  to  model  the  input-to-output 
relationships  that  result  from  using  layout  tools.  These 
relationships  are  represented  by  a  set  of  rules.  When  these 
rules  are  applied  to  a  target  design  by  running  ie,  the  result 
is  a  list  of  predicted  layout  concepts.  These  concepts  are  used 
to  estimate  area  and  delay  for  the  target  design  without  pro¬ 
ducing  an  actual  layout . 
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Another  tool  called  match  pre-processes  the  target  design,  calls 
ie  to  apply  the  rules  and  produce  the  list  of  predicted  layout 
concepts,  and  uses  this  list  to  estimate  area  and  delay. 
Therefore,  in  this  application,  ie  is  not  a  stand-alone  tool. 

FILES 

/usr/users/software/est/bin 
/usr/users/sof tware/est/ src 
/usr/users/software/est/learn 
/usr/users/software/est/learn/train 

SEE  ALSO 

artistll(l),  match(l),  rule_learner(l) ,  clusterll (1) ,  glue(5) 
AUTHORS 


Don  S.  Gelosh 
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